Hive Install in Docker

Hive Install in Docker

hive๋ฅผ ์„ค์น˜ํ•ด๋ณด์ž. Hive๋Š” ์•ž์—์„œ ์„ค์น˜ํ–ˆ๋˜ Hadoop ์ปจํ…Œ์ด๋„ˆ์— ํ•จ๊ป˜ ์„ค์น˜ํ•  ๊ฒƒ์ด๋‹ค.

Hadoop ์„ค์น˜๋Š” ์•„๋ž˜ ๋งํฌ๋ฅผ ์ฐธ๊ณ ํ•˜๊ธฐ ๋ฐ”๋ž€๋‹ค.

Hadoop3 SingleNode Install in Docker

metastore DB

์šฐ์„  metestore DB๋ฅผ ์œ„ํ•œ postgres ์ปจํ…Œ์ด๋„ˆ๋ฅผ ํ•˜๋‚˜ ์ƒ์„ฑํ•ด์•ผ ํ•œ๋‹ค. postgres์™€ hadoop ์ปจํ…Œ์ด๋„ˆ์˜ ํ†ต์‹ ์„ ์œ„ํ•ด ์•„๋ž˜์™€ ๊ฐ™์ด docker-compose.yml ํŒŒ์ผ์„ ์ƒ์„ฑํ•ด ์ฃผ์—ˆ๋‹ค.

๊ฐ๊ฐ์— ์‚ฌ์šฉ๋œ ์ด๋ฏธ์ง€๋Š” postgresql 11 ๊ณต์‹ ์ด๋ฏธ์ง€์™€ ์ „์— ์ƒ์„ฑํ–ˆ๋˜ hadoop ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

# docker-compose.yml

version: '3'
services:
  postgres:
    image: kt1201/postgres
    container_name: postgres
    ports:
      - "5432:5432"
    environment:
      POSTGRES_PASSWORD=postgres
    volumes:
      - postgres-data:/var/lib/postgresql/data
      
  hdfs:
    image: kt1201/hadoop:3.3.1
    container_name: hadoop
    ports:
      - 50070:50070
      - 10000:10000
    volumes:
      - hadoop_namenode:/root/hadoop-3.3.1/namenode
      - hadoop_datanode:/root/hadoop-3.3.1/datanode

volumes:
  postgres-data:
  hadoop_namenode:
  hadoop_datanode:

ํ›„์— ์•„๋ž˜ ๋ช…๋ น์–ด๋กœ ์ปจํ…Œ์ด๋„ˆ๋ฅผ ์‹คํ–‰์‹œ์ผœ์ค€๋‹ค.

[root@kt1201 bigdata-docker-compose]# docker-compose up -d

postgres ์ปจํ…Œ์ด๋„ˆ๋กœ ๋“ค์–ด๊ฐ€ postgres ๊ณ„์ •์œผ๋กœ psql๋กœ ์ ‘์†ํ•œ ๋’ค, hive ์œ ์ €์™€ ๋น„๋ฐ€๋ฒˆํ˜ธ๋ฅผ ์„ค์ •ํ•ด ์ค€๋‹ค.

[root@kt1201 bigdata-docker-compose]# docker exec -it 25ddd9f892a4 /bin/bash
root@25ddd9f892a4:/# su - postgres
postgres@25ddd9f892a4:~$ psql
psql (14.2 (Debian 14.2-1.pgdg110+1))
Type "help" for help.

postgres=# create user hive superuser;
CREATE ROLE
postgres=# ALTER USER hive WITH PASSWORD 'hive';
ALTER ROLE

metastore ๋ผ๋Š” database ์ƒ์„ฑ ๋ฐ hive ์œ ์ €์—๊ฒŒ ๊ถŒํ•œ์„ ๋ถ€์—ฌํ•˜๊ณ  โ€˜UTF-8โ€™ ํƒ€์ž…์„ ์ง€์ •ํ•œ๋‹ค.

postgres=# CREATE DATABASE metastore WITH OWNER hive ENCODING 'UTF8' template template0;
CREATE DATABASE
postgres=# \l
                                 List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges
-----------+----------+----------+------------+------------+-----------------------
 metastore | hive     | UTF8     | en_US.utf8 | en_US.utf8 |
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 |
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
(4 rows)
postgres=# \quit
postgres@25ddd9f892a4:~$


Hive

hive ์„ค์น˜ํŒŒ์ผ ๋‹ค์šด๋กœ๋“œ ํ›„ ์••์ถ•ํ•ด์ œ ํ•œ๋‹ค.

[root@b171915f8f63 ~]# wget 'https://mirror.navercorp.com/apache/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz'
[root@b171915f8f63 ~]# tar zxvf apache-hive-3.1.2-bin.tar.gz

Hive ๊ด€๋ จ ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ๋ฐ PATH๋ฅผ ์„ค์ •ํ•ด์ค€๋‹ค.

# ~/.bashrc

# JAVA
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.322.b06-1.el7_9.x86_64
export JAVA_OPTS="-Dfile.encoding=UTF-8"
export CLASSPATH="."

# HADOOP
export HADOOP_HOME=/root/hadoop-3.3.1
export HADOOP_CONFIG_HOME=$HADOOP_HOME/etc/hadoop

# Hive
export HIVE_HOME=/root/apache-hive-3.1.2-bin

# PATH
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin

postgresql jdbc ๋“œ๋ผ์ด๋ฒ„ ๋‹ค์šด๋กœ๋“œ ํ›„ hive ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๊ฒฝ๋กœ์— ์ถ”๊ฐ€ํ•œ๋‹ค.

[root@b171915f8f63 ~]# wget https://repo1.maven.org/maven2/org/postgresql/postgresql/42.2.20/postgresql-42.2.20.jar
[root@b171915f8f63 ~]# mv postgresql-42.2.20.jar $HIVE_HOME/lib

hive-site.xml

apache-hive-3.1.2-bin/conf/hive-site.xml ์„ค์ •์„ ํ•ด์ค€๋‹ค.

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>hive.metastore.local</name>
        <value>false</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:postgresql://postgres:5432/metastore</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>org.postgresql.Driver</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hive</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>hive</value>
    </property>
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
    </property>
    <property>
        <name>hive.server2.enable.doAs</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.server2.authentication</name>
        <value>NONE</value>
    </property>
</configuration>

ํ›„์— hive ๋””๋ ‰ํ„ฐ๋ฆฌ ์ƒ์„ฑ ๋ฐ ์“ฐ๊ธฐ ๊ถŒํ•œ์„ ๋ถ€์—ฌํ•ด์ค€๋‹ค.

[root@b171915f8f63 conf]# hadoop fs -mkdir -p /user/hive/warehouse
[root@b171915f8f63 conf]# hadoop fs -ls /user/hive
Found 1 items
drwxr-xr-x   - root supergroup          0 2022-04-20 08:07 /user/hive/warehouse
[root@b171915f8f63 conf]# hadoop fs -chmod g+w /user/hive/warehouse
[root@b171915f8f63 conf]# hdfs dfs -ls /user/hive
Found 1 items
drwxrwxr-x   - root supergroup          0 2022-04-20 08:07 /user/hive/warehouse

metastore ์ดˆ๊ธฐํ™” ํ›„ hadoop๊ตฌ๋™ ๋ฐ hive ์ ‘์†

[root@b171915f8f63 conf]# schematool -initSchema -dbType postgres
[root@b171915f8f63 conf]# start-all.sh
[root@b171915f8f63 conf]# hive


Beeline ์ ‘์†(์™ธ๋ถ€์ ‘์† ์„ค์ •)

hive ๋ง๊ณ , beeline๊ณผ ๊ฐ™์€ ์™ธ๋ถ€ ํด๋ผ์ด์–ธํŠธ๋กœ ์ ‘์†์„ ํ•˜๊ธฐ ์œ„ํ•ด์„  hiveserver2๋ฅผ ๊ตฌ๋™ํ•ด์•ผ ํ•œ๋‹ค.

์•„๋ž˜๋Š” hiveserver2๋ฅผ ๊ตฌ๋™/์ข…๋ฃŒํ•˜๋Š” script์ด๋‹ค.

# /root/apache-hive-3.1.2-bin/bin/start-hive.sh

#!/bin/bash
nohup hive --service metastore > /dev/null 2>&1 &
nohup hive --service hiveserver2 > /dev/null 2>&1 &
# /root/apache-hive-3.1.2-bin/bin/stop-hive.sh

#!/bin/bash

PID=`ps -eaf | grep hiveserver2 | grep -v grep | awk '{print $2}'`
if [[ "" != "$PID" ]]; then
    echo "killing $PID"
    kill -9 $PID
fi

PID=`ps -eaf | grep metastore | grep -v grep | awk '{print $2}'`
if [[ "" != "$PID" ]]; then
    echo "killing $PID"
    kill -9 $PID
fi

#kill -9 $(lsof -t -i:10000)

hiveserver2๋ฅผ ๊ตฌ๋™์‹œํ‚ค๊ณ  beeline์œผ๋กœ ์ ‘์†ํ•ด๋ณด์ž.

[root@b171915f8f63 bin]# start-hive.sh
[root@3f1e571ef5ad logs]# beeline -u 'jdbc:hive2://'
...
Connected to: Apache Hive (version 3.1.2)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.2 by Apache Hive
0: jdbc:hive2://>

์œ„์™€ ๊ฐ™์ด beeline์œผ๋กœ ์ ‘์†์ด ๋  ๊ฒƒ์ด๋‹ค. ํ•˜์ง€๋งŒ ์šฐ๋ฆฌ๋Š” ์™ธ๋ถ€ ํด๋ผ์ด์–ธํŠธ์—์„œ ์ ‘์†ํ• ๋•Œ ์•„๋ž˜์™€ ๊ฐ™์ด ip์™€ port๋ฅผ ๋ช…์‹œํ•ด ์ฃผ์–ด์•ผ ํ•œ๋‹ค. ๋ช…์‹œํ•˜๊ณ  beeline ์ ‘์†์„ ์‹œ๋„ํ•˜๊ฒŒ ๋˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด ๊ณ„์ • ๊ด€๋ จ ์—๋Ÿฌ๊ฐ€ ๋‚˜์˜จ๋‹ค.

[root@3f1e571ef5ad conf]# beeline -n hive -p hive -u jdbc:hive2://localhost:10000
...
Connecting to jdbc:hive2://localhost:10000/
22/04/22 06:48:39 [main]: WARN jdbc.HiveConnection: Failed to connect to localhost:10000
Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000/: Failed to open y.authorize.AuthorizationException): User: root is not allowed to impersonate anonymous (state=08S01
Beeline version 3.1.2 by Apache Hive

hadoop ์„ค์ •ํŒŒ์ผ ์ค‘ core-site.xml์— ๊ณ„์ •๊ณผ ๊ทธ๋ฃน์— ๋Œ€ํ•œ ์ ‘๊ทผ ์„ค์ •์„ ์ถ”๊ฐ€ํ•œ๋‹ค.

    <property>
        <name>hadoop.proxyuser.{์‚ฌ์šฉ ๊ณ„์ •๋ช…}.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.{์‚ฌ์šฉ ๊ณ„์ •๋ช…}.groups</name>
        <value>*</value>
    </property>

ํ›„์— hiveserver2์™€ hadoop์„ ์žฌ๊ตฌ๋™ ํ›„ beeline ์ ‘์†์„ ํ•œ๋‹ค.

[root@b171915f8f63 ~]# stop-hive.sh
[root@b171915f8f63 ~]# stop-all.sh
[root@b171915f8f63 conf]# beeline -n hive -p hive -u jdbc:hive2://localhost:10000

DBeaver๋กœ๋„ ์ •์ƒ ์ ‘์† ๋  ๊ฒƒ์ด๋‹ค.


Tez ์„ค์น˜

ํ˜„์žฌ hive engine์„ ํ™•์ธ์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

0: jdbc:hive2://localhost:10000/> set hive.execution.engine;
+----------------------------+
|            set             |
+----------------------------+
| hive.execution.engine=mr  |
+----------------------------+
1 row selected (0.026 seconds)

ํ˜„์žฌ ์—”์ง„์€ MapReduce๋กœ ๋˜์–ด์žˆ๋‹ค. ์ด๋ฅผ Tez ์—”์ง„์œผ๋กœ ๋ฐ”๊ฟ”๋ณด์ž. ์„ค์น˜ํ•  Tez ๋ฒ„์ „์€ 0.9.2 ๋ฒ„์ „์ด๋‹ค. ์šฐ์„  ์•„๋ž˜ ๋ช…๋ น์–ด๋กœ ๋‹ค์šด๋ฐ›๊ณ  ์••์ถ•์„ ํ’€์–ด์ค€๋‹ค.

[root@b171915f8f63 ~]# wget https://downloads.apache.org/tez/0.9.2/apache-tez-0.9.2-bin.tar.gz
[root@b171915f8f63 ~]# tar -zxvf apache-tez-0.9.2-bin.tar.gz


hive-site.xml ์„ค์ • ์ถ”๊ฐ€

hive-site.xml์— ์•„๋ž˜ ์„ค์ •์„ ํ•ด์ค€๋‹ค.

# hive-site.xml

<configuration>
    ...
    <property>
        <name>tez.lib.uris</name>
        <value>/root/apache-tez-0.9.2-bin</value>
    </property>
    <property>
        <name>hive.execution.engine</name>
        <value>tez</value>
    </property>
</configuration>


tez-site.xml

tez-site.xml์„ ์ƒˆ๋กœ ๋งŒ๋“ค์–ด์„œ ์•„๋ž˜์™€ ๊ฐ™์ด ์ž‘์„ฑํ•œ๋‹ค. ์ด๋•Œ tez.lib.uris๋Š” hdfs์—์„œ tez ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์œ„์น˜์ด๋‹ค. ์ž‘์„ฑ ํ›„์— ๋„ฃ์–ด์ค„ ๊ฒƒ์ด๋‹ค.

# /root/apache-tez-0.9.2-bin/conf

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>tez.lib.uris</name>
        <value>${fs.defaultFS}/user/tez/tez.tar.gz</value>
    </property>
    <property>
        <name>tez.use.cluster.hadoop-libs</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.tez.container.size</name>
        <value>3020</value>
    </property>
</configuration>


๋‹ค์Œ์€ ํ™˜๊ฒฝ๋ณ€์ˆ˜ ์„ค์ •์„ ์ถ”๊ฐ€ํ•ด ์ค€๋‹ค.

# ~/.bashrc

#Tez
export TEZ_HOME=/root/apache-tez-0.9.2-bin
export TEZ_CONF_DIR=$TEZ_HOME/conf
export TEZ_JARS=$TEZ_HOME/*:$TEZ_HOME/lib/*

export HADOOP_CLASSPATH=$CLASSPATH:${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*

ํ™˜๊ฒฝ๋ณ€์ˆ˜ ์„ค์ •์„ ์ ์šฉํ•œ๋‹ค.

[root@b171915f8f63 ~]# source ~/.bashrc

์ด์ œ tez ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ tez-site.xml์— ์„ค์ •ํ•œ hdfs ๊ฒฝ๋กœ๋กœ ๋„ฃ์–ด์ค€๋‹ค.

[root@b171915f8f63 ~]# hdfs dfs -mkdir -p /user/tez
[root@b171915f8f63 ~]# hdfs dfs -put /root/apache-tez-0.9.2-bin/share/tez.tar.gz /user/tez

์ด์ œ hive๋ฅผ ์‹คํ–‰ํ•˜๊ณ  ์—”์ง„์„ ํ™•์ธํ•œ๋‹ค.

[root@b171915f8f63 conf]# hive
hive> set hive.execution.engine;
hive.execution.engine=tez

ํ•ด๋‹น ์ปจํ…Œ์ด๋„ˆ๋ฅผ commitํ•ด ์ค€๋‹ค.

[root@b171915f8f63 conf]# exit
[root@kt1201 ~]# docker commit hadoop kt1201/hadoop:3.3.1

Leave a comment