Hadoop3 SingleNode Install in Docker

๋„์ปค์— hadoop3์„ ์„ค์น˜ํ•  ์ผ์ด ์ƒ๊ฒจ ์ •๋ฆฌํ–ˆ๋‹ค.


์‹คํ–‰ ํ™˜๊ฒฝ

ํ•„์ž๋Š” Docker๋ฅผ ์ด์šฉํ•œ ์ปจํ…Œ์ด๋„ˆ ํ™˜๊ฒฝ์—์„œ Single Node๋กœ ์„ค์น˜ํ•  ๊ฒƒ์ด๋‹ค.

์‚ฌ์šฉํ•  OS๋Š” centos7.9์ด๋‹ค. ์•„๋ž˜ ํ•ญ๋ชฉ์ด ์ „์ œ๊ฐ€ ๋˜์–ด ์žˆ์–ด์•ผ ํ•œ๋‹ค.

  • java 1.8 (JAVA_HOME๊ณผ PATH ํ™˜๊ฒฝ์„ค์ •๊นŒ์ง€ ์™„๋ฃŒ) ์„ค์น˜
  • ๋กœ์ปฌ์— docker-compose ์„ค์น˜


Docker์— ์„ค์น˜ ์‹œ ์ฃผ์˜์‚ฌํ•ญ

Docker์— ์„ค์น˜ ์‹œ ๊ทธ๋ƒฅ centos ์ปจํ…Œ์ด๋„ˆ๋ฅผ ์ƒ์„ฑํ•˜์—ฌ systemctl ๋ช…๋ น์„ ์‚ฌ์šฉํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด ๋‚˜์˜จ๋‹ค.

[root@52ef2bb43881 ~]# systemctl
Failed to get D-Bus connection: Operation not permitted

systemctl ๋ช…๋ น์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„  ์ฒ˜์Œ ์ปจํ…Œ์ด๋„ˆ๋ฅผ ์‹คํ–‰ ํ•  ๋•Œ ์•„๋ž˜์™€ ๊ฐ™์ด โ€“privileged ์˜ต์…˜๊ณผ -d ์˜ต์…˜์œผ๋กœ /sbin/init์„ ์‹คํ–‰ํ•œํ›„ exec๋กœ /bin/bash๋ฅผ ์‹คํ–‰์‹œ์ผœ์•ผ ํ•œ๋‹ค

$ docker run --privileged -d --name mycentos centos:7 /sbin/init
$ docker exec -it mycentos /bin/bash


Hadoop ์„ค์น˜

ํ•˜๋‘ก์„ ์„ค์น˜ํ•œ๋‹ค. ์„ค์น˜ ๋ฒ„์ „์€ 3.3.1์ด๋‹ค.

[root@52ef2bb43881 ~]# wget https://mirrors.sonic.net/apache/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
[root@52ef2bb43881 ~]# tar xvzf hadoop-3.3.1.tar.gz

ํ•˜๋‘ก์— ๋Œ€ํ•œ ํ™˜๊ฒฝ๋ณ€์ˆ˜ ์„ค์ •์„ ~/.bashrc์— ์•„๋ž˜์™€ ๊ฐ™์ด ํ•ด์ค€๋‹ค. HADOOP_HOME์€ ๊ฐ์ž์— ๋งž๋Š” ๊ฒฝ๋กœ๋ฅผ ์จ์ฃผ๋ฉด ๋œ๋‹ค.

export HADOOP_HOME=/root/hadoop-3.3.1
export HADOOP_CONFIG_HOME=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

์ดํ›„ ํ•˜๋‘ก ๋ฒ„์ „ ํ™•์ธ์„ ํ†ตํ•ด ํ™˜๊ฒฝ๋ณ€์ˆ˜ ๋ฐ ์„ค์น˜ ํ™•์ธ์„ ํ•œ๋‹ค

[root@52ef2bb43881 ~]# source ~/.bashrc
[root@52ef2bb43881 ~]# hadoop version
Hadoop 3.3.1
Source code repository https://github.com/apache/hadoop.git -r a3b9c37a397ad4188041dd80621bdeefc46885f2
Compiled by ubuntu on 2021-06-15T05:13Z
Compiled with protoc 3.7.1
From source with checksum 88a4ddb2299aca054416d6b7f81ca55
This command was run using /home/hadoop-3.3.1/share/hadoop/common/hadoop-common-3.3.1.jar

์ง€๊ธˆ๊นŒ์ง€ standalone hadoop ์„ค์น˜์˜€๋‹ค. ์—ฌ๊ธฐ์„œ ์ด์–ด์„œ Single Node Cluster๋กœ ์„ค์น˜๋ฅผ ํ•ด๋ณด์ž.


ํ•„์š” ํŒจํ‚ค์ง€ ์„ค์น˜

ํ•˜๋‘ก์—์„œ๋Š” ๋…ธ๋“œ ๊ฐ„์˜ ํ†ต์‹ ์„ ssh๋ฅผ ์ด์šฉํ•˜์—ฌ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ssh ์„ค์น˜๊ฐ€ ํ•„์ˆ˜์ด๋‹ค.

[root@52ef2bb43881 ~]# yum install openssh-server openssh-clients openssh-askpass -y


SSH Keygen

ssh๋ฅผ ๋น„๋ฐ€๋ฒˆํ˜ธ ์—†์ด ํ†ต์‹ ํ•˜๊ธฐ ์œ„ํ•ด ssh-keygen์„ ํ•ด์ค€๋‹ค.

[root@52ef2bb43881 ~]# ssh-keygen -t rsa
[root@52ef2bb43881 ~]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[root@52ef2bb43881 ~]# systemctl start sshd
[root@52ef2bb43881 ~]# ssh localhost
Last login: Tue Apr 19 08:41:05 2022 from localhost
[root@52ef2bb43881 ~]# exit
logout
Connection to localhost closed.
[root@52ef2bb43881 ~]#


Hadoop ์„ค์ •

hadoop-env.sh ํŒŒ์ผ์„ ์—ด์–ด ์•„๋ž˜์™€ ๊ฐ™์ด ์ž‘์„ฑํ•ด์ค€๋‹ค.

[root@52ef2bb43881 ~]# cd $HADOOP_CONFIG_HOME
[root@52ef2bb43881 hadoop]# vim hadoop-env.sh
# hadoop-env.sh
...
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.322.b06-1.el7_9.x86_64
export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_SECONDARYNAMENODE_USER="root"
export YARN_RESOURCEMANAGER_USER="root"
export YARN_NODEMANAGER_USER="root"

๊ทธ๋ฆฌ๊ณ  ๊ฐ ๋ฐ๋ชฌ๋“ค์ด ํ™ˆ์œผ๋กœ ์‚ฌ์šฉํ•  ๋””๋ ‰ํ† ๋ฆฌ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

[root@52ef2bb43881 ~]# mkdir $HADOOP_HOME/temp
[root@52ef2bb43881 ~]# mkdir $HADOOP_HOME/namenode
[root@52ef2bb43881 ~]# mkdir $HADOOP_HOME/datanode

์ดํ›„ ๋‹ค์Œ ๊ฐ ์„ค์ • ํŒŒ์ผ๋“ค์„ ์ˆ˜์ •ํ•ด ์ค€๋‹ค. ์„ค์ • ํŒŒ์ผ๋“ค์ด ์žˆ๋Š” ๋””๋ ‰ํ† ๋ฆฌ ๊ฒฝ๋กœ๋กœ ์ด๋™ํ•œ๋‹ค.

[root@52ef2bb43881 ~]# cd $HADOOP_CONFIG_HOME


core-site.xml

HDFS์™€ MapReduce์—์„œ ๊ณตํ†ต์ ์œผ๋กœ ์‚ฌ์šฉํ•  ํ™˜๊ฒฝ์ •๋ณด

[root@52ef2bb43881 hadoop]# vim core-site.xml
# core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/root/hadoop-3.3.1/temp</value>
    </property>

    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000</value>
        <final>true</final>
    </property>

    <!-- Trash -->
    <property>
        <name>fs.trash.interval</name>
        <value>1440</value>
    </property>
    <property>
        <name>fs.trash.checkpoint.interval</name>
        <value>120</value>
    </property>
</configuration>


hdfs-site.xml

HDFS์—์„œ ์‚ฌ์šฉํ•  ํ™˜๊ฒฝ์ •๋ณด

[root@52ef2bb43881 hadoop]# vim hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
        <final>true</final>
    </property>

    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/root/hadoop-3.3.1/namenode</value>
        <final>true</final>
    </property>

    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/root/hadoop-3.3.1/datanode</value>
        <final>true</final>
    </property>
</configuration>


mapred-site.xml

MapReduce์—์„œ ์‚ฌ์šฉํ•  ํ™˜๊ฒฝ ์ •๋ณด

[root@52ef2bb43881 hadoop]# vim mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>localhost:9001</value>
    </property>
</configuration>


Hadoop ์‹คํ–‰

๋จผ์ € ๋„ค์ž„๋…ธ๋“œ๋ฅผ ํฌ๋งทํ•ด์ค€๋‹ค.

[root@52ef2bb43881 hadoop]# hadoop namenode -format

ํ•˜๋‘ก ์„ค์น˜๊ฐ€ ๋๋‚ฌ๋‹ค. docker commit ํ•˜์—ฌ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ด์ค€๋‹ค.

[root@kt1201 ~]# docker commit 52ef2bb43881 kt1201/hadoop:3.3.1

๋‹ค์‹œ ์ปจํ…Œ์ด๋„ˆ๋กœ ๋Œ์•„์™€์„œ ํ•˜๋‘ก์„ ์‹คํ–‰์‹œํ‚ค๊ณ , jps๋กœ ํ™•์ธํ•ด๋ณด์ž.

[root@52ef2bb43881 hadoop]# start-all.sh
Starting namenodes on [localhost]
Last login: Wed Apr 20 02:12:34 UTC 2022 from localhost on pts/1
Starting datanodes
Last login: Wed Apr 20 03:56:37 UTC 2022 on pts/0
Starting secondary namenodes [52ef2bb43881]
Last login: Wed Apr 20 03:56:39 UTC 2022 on pts/0
52ef2bb43881: Warning: Permanently added '52ef2bb43881,172.17.0.2' (ECDSA) to the list of known hosts.
Starting resourcemanager
Last login: Wed Apr 20 03:56:42 UTC 2022 on pts/0
Starting nodemanagers
Last login: Wed Apr 20 03:56:46 UTC 2022 on pts/0
[root@52ef2bb43881 hadoop]# 
[root@52ef2bb43881 hadoop]# 
[root@52ef2bb43881 hadoop]# jps
2145 Jps
1369 SecondaryNameNode
1145 DataNode
971 NameNode
1819 NodeManager
1661 ResourceManager


ํ•˜๋‘ก ์„ค์น˜๊ฐ€ ์™„๋ฃŒ๋˜์—ˆ๋‹ค. ์ดํ›„ ๋‹ค๋ฅธ Ecosystem๋“ค๊ณผ์˜ ํ†ต์‹ ์„ ์œ„ํ•˜์—ฌ ๋ฏธ๋ฆฌ docker-compose.yml์„ ์•„๋ž˜์™€ ๊ฐ™์ด ์ž‘์„ฑํ•ด ์ฃผ์—ˆ๋‹ค.

# docker-compose.yml

version: '3'
services:      
  hdfs:
    image: kt1201/hadoop:3.3.1
    container_name: hadoop
    ports:
      - 50070:50070
    volumes:
      - hadoop_namenode:/root/hadoop-3.3.1/namenode
      - hadoop_datanode:/root/hadoop-3.3.1/datanode

volumes:
  hadoop_namenode:
  hadoop_datanode:

์œ„์—์„œ 50070ํฌํŠธ์˜ ํฌํŠธํฌ์›Œ๋”ฉ์€ ํ•˜๋‘ก์—์„œ ์ œ๊ณตํ•˜๋Š” UI๋ฅผ ๋ณผ์ˆ˜ ์žˆ๋Š” ํฌํŠธ์ด๋‹ค. ํ•ด๋‹น yml ํŒŒ์ผ๋กœ ์ปจํ…Œ์ด๋„ˆ๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ํ•˜๋‘ก์„ ๊ตฌ๋™ ์‹œํ‚จ ํ›„, ๋ธŒ๋ผ์šฐ์ €์—์„œ 50070ํฌํŠธ๋กœ ๋“ค์–ด๊ฐ€๋ฉด ์•„๋ž˜์™€ ๊ฐ™์€ ํ™”๋ฉด์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.


๋‹ค์Œ์žฅ์€ hive ์„ค์น˜๋ฅผ ํ•ด๋ณด์ž. Hive Install in Docker

Leave a comment