# 命令行部署 --- ## Hive HA部署 ### 前提 Hive需要依赖HDFS、YARN、ZooKeeper集群,元数据存储使用PG。 如果 Hive 以 HA+Kerberos模式部署,则需要ZooKeeper开启Kerberos认证。 ZooKeeper安装部署请参考:[ZooKeeper 安装](../zookeeper/installation-zookeeper.rst)。 ZooKeeper服务地址假定为`zookeeper1:2181,zookeeper2:2181,zookeeper3:2181` YARN安装部署请参考:[YARN 安装](../yarn/installation-yarn.rst)。 YARN服务地址假定为`yarn1:8090,yarn2:8090,yarn3:8090` HDFS安装部署请参考:[HDFS 安装](../hdfs/installation-hdfs.rst)。 HDFS服务地址假定为`hdfs1:9000,hdfs2:9000,hdfs3:9000` Hive的安装需要依赖外部数据库做元数据存储,默认使用Skylab平台本身的Postgres数据库。 Postgres地址假定为`PG1` 若需要配置Kerberos认证,那么需要提前部署好KDC服务:[Kerberos 安装](../kerberos/installation-kerberos.rst)。 KDC服务地址假定为`kdc1` 如Hive与HDFS/YARN集群分离部署,则需要在所以Hive机器上安装HDFS Client并同步HDFS配置文件。 若开启Ranger认证,Ranger安装部署请参考:[Ranger 安装](../ranger/ranger-start-installation-cli.md)。 Ranger 服务地址假定为`ranger1` #### 配置yum源并安装lava 登录hive1机器,然后切换到root用户 ```shell ssh hive1 su root ``` 配置yum源,安装lava命令行管理工具 ```sh # 从yum源所在机器(假设为192.168.1.10)获取repo文件 scp root@192.168.1.10:/etc/yum.repos.d/oushu.repo /etc/yum.repos.d/oushu.repo # 追加yum源所在机器信息到/etc/hosts文件 # 安装lava命令行管理工具 yum clean all yum makecache yum install -y lava ``` 创建hivehost: ```shell touch ${HOME}/hivehost ``` 配置hivehost内容为Hive有依赖的所有的节点hostname: ``` hive1 hive2 ``` 修改权限: ```shell chmod 777 ${HOME}/hivehost ``` 在首台机器上和集群内其他节点交换公钥,以便ssh免密码登陆和分发配置文件 ```sh # 和集群内其他机器交换公钥 lava ssh-exkeys -f ${HOME}/hivehost -p ******** # 将repo文件分发给集群内其他机器 lava scp -f ${HOME}/hivehost /etc/yum.repos.d/oushu.repo =:/etc/yum.repos.d ``` ### 安装 #### 准备 ```shell lava ssh -f ${HOME}/hivehost -e 'yum install -y hive' # 如果Hive为分离部署,则需要安装HDFS客户端(可选) lava ssh -f ${HOME}/hivehost -e 'yum install -y hdfs' ``` 创建Hive路径,赋予hive用户权限 ```shell lava ssh -f ${HOME}/hivehost -e 'mkdir -p /data1/hdfs/hive/hdfs' lava ssh -f ${HOME}/hivehost -e 'chown -R hive:hadoop /data1/hdfs/hive' lava ssh -f ${HOME}/hivehost -e 'mkdir -p /etc/security/keytabs/' ``` 其中参数:hive.metastore.warehouse.dir指定的路径需要在**HDFS**中创建。 (可选:当使用Kerberos时,需先在kdc1配置Hive的principal,同步keytab后再进行路径创建,参见下文*Hive的KDC认证*) ```shell hdfs dfs -mkdir -p /usr/hive/warehouse hdfs dfs -mkdir -p /hive/tmp hdfs dfs -mkdir -p /usr/hive/log hdfs dfs -chmod -R 755 /usr/hive hdfs dfs -chmod -R 755 /hive/tmp ``` 修改存储在 /usr/local/oushu/conf/hive 的hive-env.sh文件 ```shell export JAVA_HOME=/usr/java/default/jre ``` ##### Hive的KDC认证(可选) 如果开启Kerberos,则需要在所有Hive节点安装Kerberos客户端。 ```shell lava ssh -f ${HOME}/hivehost -e "yum install -y krb5-libs krb5-workstation" ``` 创建principal和keytab ```shell ssh kdc1 kadmin.local ``` 为Hive进行KDC认证 ```sh # 为hive角色生成实例 addprinc -randkey hive/hive1@KDCSERVER.OUSHU.COM addprinc -randkey hive/hive2@KDCSERVER.OUSHU.COM addprinc -randkey HTTP/hive1@KDCSERVER.OUSHU.COM addprinc -randkey HTTP/hive2@KDCSERVER.OUSHU.COM addprinc -randkey hive@KDCSERVER.OUSHU.COM # 为每个实例生成keytab文件 ktadd -k /etc/security/keytabs/hive.keytab hive/hive1@KDCSERVER.OUSHU.COM ktadd -k /etc/security/keytabs/hive.keytab hive/hive2@KDCSERVER.OUSHU.COM ktadd -k /etc/security/keytabs/hive.keytab hive@KDCSERVER.OUSHU.COM ktadd -norandkey -k /etc/security/keytabs/hive.keytab HTTP/hive1@KDCSERVER.OUSHU.COM ktadd -norandkey -k /etc/security/keytabs/hive.keytab HTTP/hive2@KDCSERVER.OUSHU.COM # 退出 quit ``` 在hive1分发并修改keytab文件的权限 ```shell ssh hive1 scp root@kdc1:/etc/security/keytabs/hive.keytab /etc/security/keytabs/hive.keytab scp root@kdc1:/etc/security/keytabs/hdfs.keytab /etc/security/keytabs/hdfs.keytab scp root@kdc1:/etc/security/keytabs/yarn.keytab /etc/security/keytabs/yarn.keytab scp root@kdc1:/etc/krb5.conf /etc/krb5.conf lava scp -r -f ${HOME}/hivehost /etc/security/keytabs/hive.keytab =:/etc/security/keytabs/hive.keytab lava scp -r -f ${HOME}/hivehost /etc/security/keytabs/hdfs.keytab =:/etc/security/keytabs/hdfs.keytab lava scp -r -f ${HOME}/hivehost /etc/security/keytabs/yarn.keytab =:/etc/security/keytabs/yarn.keytab lava scp -r -f ${HOME}/hivehost /etc/krb5.conf =:/etc/krb5.conf lava ssh -f ${HOME}/hivehost -e 'chown hive /etc/security/keytabs/hive.keytab' lava ssh -f ${HOME}/hivehost -e 'chmod 400 /etc/security/keytabs/hive.keytab' ``` ### 配置 #### 元数据库配置 修改在/usr/local/oushu/conf/hive/下的hive-site.xml,使Hive启用PG ```xml javax.jdo.option.ConnectionDriverName org.postgresql.Driver JDBC驱动名 hive.metastore.db.type postgres javax.jdo.option.ConnectionURL jdbc:postgresql://datanode01:3306/hive_db JDBC连接名 javax.jdo.option.ConnectionUserName hive 连接metastore数据库的用户名(pg创建) javax.jdo.option.ConnectionPassword {此处须配置Skylab PG的强密码} 连接metastore数据库的密码(pg创建) hive.metastore.schema.verification false 强制metastore schema的版本一致性 ``` #### Hive基础配置 修改/usr/local/oushu/conf/hive的hive-site.xml文件 ```xml hive.exec.local.scratchdir /data1/hdfs/hive/hdfs hive的本地临时目录,用来存储不同阶段的map/reduce的执行计划 hive.downloaded.resources.dir /data1/hdfs/hive/${hive.session.id}_resources hive下载的本地临时目录 hive.querylog.location /data1/hdfs/hive/hdfs hive运行时结构化日志路径 hive.server2.logging.operation.log.location /data1/hdfs/hive/hdfs/operation_logs 日志开启时的,操作日志路径 hive.metastore.warehouse.dir /usr/hive/warehouse Hive数据仓库在HDFS中的路径 hive.metastore.warehouse.external.dir hive.server2.support.dynamic.service.discovery true hive.server2.zookeeper.namespace hiveserver2_zk hive.zookeeper.quorum zookeeper1:2181,zookeeper2:2181,zookeeper3:2181 hive.zookeeper.client.port 2181 hive.metastore.uris thrift://hive1:9083,thrift://hive2:9083 远程metastore的 Thrift URI,以供metastore客户端连接metastore服务端 ``` 分发配置到hive2 ```shell lava scp -r -f ${HOME}/hivehost /usr/local/oushu/conf/hive/* =:/usr/local/oushu/conf/hive/ ``` 登录到hive2,并修改/usr/local/oushu/conf/hive/下的hive-site.xml ```shell hive.server2.thrift.bind.host hive2 ``` ##### Hive调优(可选) 一般推荐Hive使用默认参数运行,如果期待调优,建议优先调整Hive使用的资源,具体参考YARN章节[YARN 安装](../yarn/installation-yarn.rst)下的"配置调优(可选)"部分。 #### Kerberos配置(可选) 在hive1节点下 修改在/usr/local/oushu/conf/hive的hive-env.sh文件 ```shell export CLIENT_JVMFLAGS="-Djava.security.auth.login.config=/usr/local/oushu/conf/zookeeper/client-jaas.conf" ``` 如果本机没有ZooKeeper部署,需要本地同步ZooKeeper的keytab并创建client-jaas.conf文件,具体参考 [ZooKeeper 安装](../zookeeper/installation-zookeeper.rst)。 如果Hive 部署模式为HA + Kerberos模式,需要先在Zookeeper客户端创建Hive路径 ```shell sudo -u zookeeper /usr/local/oushu/zookeeper/bin/zkCil.sh [zk: localhost:2181(CONNECTED) 1] create /hiveserver2_zk ``` 修改在/usr/local/oushu/conf/hive的hive-site.xml文件 ``` hive.server2.enable.doAs true hive.server2.authentication KERBEROS hive.server2.authentication.kerberos.principal hive/_HOST@KDCSERVER.OUSHU.COM hive.server2.authentication.kerberos.keytab /etc/security/keytabs/hive.keytab hive.metastore.sasl.enabled true hive.metastore.kerberos.keytab.file /etc/security/keytabs/hive.keytab hive.metastore.kerberos.principal hive/_HOST@KDCSERVER.OUSHU.COM ``` 同步Hive的Kerberos配置 ```shell lava scp -r -f ${HOME}/hivehost /usr/local/oushu/conf/hive/* =:/usr/local/oushu/conf/hive/ lava ssh -f ${HOME}/hivehost -e 'mkdir -p /usr/local/oushu/conf/zookeeper/' lava ssh -f ${HOME}/hivehost -e 'chmod -R 755 /usr/local/oushu/conf/zookeeper/' lava ssh -f ${HOME}/hivehost -e 'chown -R hive:hadoop /usr/local/oushu/conf/zookeeper/' lava scp -r -f ${HOME}/hivehost /usr/local/oushu/conf/zookeeper/client-jaas.conf =:/usr/local/oushu/conf/zookeeper/ ``` 登录hdfs1 ```shell ssh hdfs1 su root ``` 修改/usr/local/oushu/conf/common的core-site.xml文件,设置Hive的代理用户,修改后需要重启NameNode,DataNode ``` hadoop.proxyuser.hdfs.groups * hadoop.proxyuser.hdfs.hosts * hadoop.proxyuser.root.groups * hadoop.proxyuser.root.hosts * hadoop.proxyuser.hive.groups * hadoop.proxyuser.hive.hosts * hadoop.proxyuser.HTTP.groups * hadoop.proxyuser.HTTP.hosts * hadoop.proxyuser.hive.users * hadoop.proxyuser.hdfs.users * hadoop.proxyuser.root.users * ``` 在hdfs1上创建hivehost: ```shell touch ${HOME}/hivehost ``` 配置hivehost内容为Hive有依赖的所有的节点hostname: ``` hive1 hive2 ``` 在hdfs1上创建yarnhost: ```shell touch ${HOME}/yarnhost ``` 配置yarnhost内容为Hive有依赖的所有的节点hostname: ``` yarn1 yarn2 yarn3 ``` 在hdfs1机器上和集群节点交换公钥,以便ssh免密码登陆和分发配置文件 ```sh # 和集群内其他机器交换公钥 lava ssh-exkeys -f ${HOME}/hivehost -p ******** lava ssh-exkeys -f ${HOME}/yarnhost -p ******** # 将repo文件分发给集群内其他机器 lava scp -f ${HOME}/hivehost /etc/yum.repos.d/oushu.repo =:/etc/yum.repos.d lava scp -f ${HOME}/yarnhost /etc/yum.repos.d/oushu.repo =:/etc/yum.repos.d ``` 修改完HDFS配置文件后需要同步到HDFS所有节点,并重启HDFS集群 如果没有对core-site等HDFS、YARN相关配置文件进行修改,则无需重启集群服务使参数生效。 ```shell lava scp -r -f ${HOME}/hdfshost /usr/local/oushu/conf/common/* =:/usr/local/oushu/conf/common/ lava scp -r -f ${HOME}/yarnhost /usr/local/oushu/conf/common/core-site.xml =:/usr/local/oushu/conf/common/ lava scp -r -f ${HOME}/hivehost /usr/local/oushu/conf/common/* =:/usr/local/oushu/conf/hive/ # 重启HDFS集群 lava ssh -f ${HOME}/nnhostfile -e 'sudo -E -u hdfs hdfs --daemon stop namenode' lava ssh -f ${HOME}/dnhostfile -e 'sudo -E -u hdfs hdfs --daemon stop datanode' lava ssh -f ${HOME}/jnhostfile -e 'sudo -E -u hdfs hdfs --daemon stop journalnode' lava ssh -f ${HOME}/nnhostfile -e 'sudo -E -u hdfs hdfs --daemon start namenode' lava ssh -f ${HOME}/dnhostfile -e 'sudo -E -u hdfs hdfs --daemon start datanode' lava ssh -f ${HOME}/jnhostfile -e 'sudo -E -u hdfs hdfs --daemon start journalnode' # 重启YARN集群 lava ssh -f ${HOME}/yarnhost -e 'sudo -E -u yarn yarn --daemon stop nodemanager' lava ssh -f ${HOME}/yarnhost -e 'sudo -E -u yarn yarn --daemon stop resourcemanager' lava ssh -f ${HOME}/yarnhost -e 'sudo -E -u yarn yarn --daemon start nodemanager' lava ssh -f ${HOME}/yarnhost -e 'sudo -E -u yarn yarn --daemon start resourcemanager' ``` ### 启动 #### 元数据库 在hive1节点使用root用户执行下面命令创建Hive元数据库 ```shell ssh PG1 psql -d postgres -h hive1 -p 4432 -U root -Atc "create database hive_db;" ``` 初始化Hive元数据 ```shell ssh hive1 source /usr/local/oushu/conf/hive/hive-env.sh /usr/local/oushu/hive/bin/schematool -dbType postgres -initSchema ``` #### Hive启动 如果是kerberos+HA模式启动Hive,需要先在Zookeeper上创建HA需要的路径,防止Hive自身启动时使用带Kerberos权限的用户创建路径造成HA注册失败。 其中&host+port和&hive.server2.zookeeper.namespace分别为Zookeeper集群任一节点地址端口和hive-site中设置的HA路径。 ```shell su hive /usr/local/oushu/hive/bin/zkCli.sh -server &host+port create /&hive.server2.zookeeper.namespace ``` 启动Hive ```shell su hive lava ssh -f /root/hivehost -e 'nohup hive --service metastore >/dev/null 2>&1 &' lava ssh -f /root/hivehost -e 'nohup hive --service hiveserver2 >/dev/null 2>&1 &' ``` ### 检查状态 登录zookeeper1机器 ```shell ssh zookeeper1 su zookeeper # 进入zookeeper客户端并检查HA是否注册 /usr/local/oushu/zookeeper/bin/zkCli.sh [zk: localhost:2181(CONNECTED) 1] ls /hiveserver2_zk [serverUri=VM-128-22-centos:10000;version=3.1.3;sequence=0000000001, serverUri=vm-128-22-centos:10000;version=3.1.3;sequence=0000000000] ``` 执行sql测试hive是否可用 ```shell # 通过hive命令进入客户端 hive hive:>create database td_test; OK Time taken:0.201 seconds hive:>use td_test; OK hive:>create table test(id int); OK Time taken:0.234 seconds hive:>insert into test values(1),(2); OK Time taken:14.73 seconds, Fetch:1 row(s) hive:>select * from test; OK 1 2 Time taken: 11.48 seconds, Fetched: 2 row(s) ``` ### 注册到Skylab(可选) Kerberos将要安装的机器需要通过机器管理添加到skylab中,如果您尚未添加,请参考[注册机器](../start/install-lava.md)。 在hive1上修改/usr/local/oushu/lava/conf配置`server.json`,替换localhost为skylab的服务器ip,具体skylab的基础服务lava安装步骤请参考:[lava安装](../start/start-installation.rst)。 然后创建`~/hive.json`文件,文件内容参考如下: ```json { "data": { "name": "HiveCluster", "group_roles": [ { "role": "hive.metastore", "cluster_name": "metastore-id", "group_name": "metastore", "machines": [ { "id": 1, "name": "metastore1", "subnet": "lava", "data_ip": "192.168.1.11", "manage_ip": "", "assist_port": 1622, "ssh_port": 22 },{ "id": 2, "name": "metastore2", "subnet": "lava", "data_ip": "192.168.1.11", "manage_ip": "", "assist_port": 1622, "ssh_port": 22 } ] }, { "role": "hive.hiveservice2", "cluster_name": "hiveservice2-id", "group_name": "hiveservice2", "machines": [ { "id": 1, "name": "hiveservice2-1", "subnet": "lava", "data_ip": "192.168.1.11", "manage_ip": "", "assist_port": 1622, "ssh_port": 22 },{ "id": 2, "name": "hiveservice2-2", "subnet": "lava", "data_ip": "192.168.1.11", "manage_ip": "", "assist_port": 1622, "ssh_port": 22 } ] } ], "config": { "hive-env.sh": [ { "key": "HIVE_HOME", "value": "/usr/local/oushu/hive" }, { "key": "HIVE_CONF_DIR", "value": "/usr/local/oushu/conf/hive" }, { "key": "HIVE_LOG_DIR", "value": "/usr/local/oushu/log/hive" }, { "key": "HADOOP_CONF_DIR", "value": "/usr/local/oushu/conf/hive" } ], "hive-site.xml": [ { "key": "hive.exec.local.scratchdir", "value": "/data1/hdfs/hive/hdfs" }, { "key": "hive.querylog.location", "value": "/data1/hdfs/hive/hdfs" }, { "key": "hive.metastore.warehouse.dir", "value": "/usr/hive/warehouse" }, { "key": "javax.jdo.option.ConnectionDriverName", "value": "org.postgresql.Driver" }, { "key": "javax.jdo.option.ConnectionURL", "value": "jdbc:postgresql://datanode01:3306/hive_db" }, { "key": "hive.server2.support.dynamic.service.discovery", "value": "true" }, { "key": "hive.server2.zookeeper.namespace", "value": "2181" },{ "key": "hive.zookeeper.client.port", "value": "2181" },{ "key": "hive.zookeeper.quorum", "value": "zookeeper1:2181,zookeeper2:2181,zookeeper3:2181" },{ "key": "hive.metastore.uris", "value": "thrift://hive1:9083,thrift://hive2:9083" } ] } } } ``` 上述配置文件中,需要根据实际情况修改machines数组中的机器信息,在平台基础组件lava所安装的机器执行: ``` psql lavaadmin -p 4432 -U oushu -c "select m.id,m.name,s.name as subnet,m.private_ip as data_ip,m.public_ip as manage_ip,m.assist_port,m.ssh_port from machine as m,subnet as s where m.subnet_id=s.id;" ``` 获取到所需的机器信息,根据服务角色对应的节点,将机器信息添加到machines数组中。 例如hive1对应的Hive MetaStore角色,hive1的机器信息需要备添加到hive.metastore角色对应的machines数组中。 调用lava命令注册集群: ``` lava login -u oushu -p ******** -T {租户id} lava onprem-register service -s Hive -f ~/hive.json ``` 如果返回值为: ``` Add service by self success ``` 则表示注册成功,如果有错误信息,请根据错误信息处理。 同时,从页面登录后,在自动部署模块对应服务中可以查看到新添加的集群。 ### Hive集成Ranger认证(可选) #### Ranger安装 如果开启Ranger,则需要在所有Hive节点安装Ranger客户端。 ```shell lava ssh -f ${HOME}/hivehost -e "yum install -y ranger-hive-plugin" lava ssh -f ${HOME}/hivehost -e "ln -s /usr/local/oushu/conf/hive /usr/local/oushu/hive/conf" ``` #### Ranger配置 在hive1节点下修改配置文件/usr/local/oushu/ranger-hive-plugin_2.3.0/install.properties ```shell POLICY_MGR_URL=http://ranger1:6080 REPOSITORY_NAME=hivedev COMPONENT_INSTALL_DIR_NAME=/usr/local/oushu/hive ``` 确认配置文件/usr/local/oushu/conf/hive/hive-site.xml中配置了以下参数 ```shell hive.metastore.uris thrift://hive1:9083,thrift://hive2:9083 远程metastore的 Thrift URI,以供metastore客户端连接metastore服务端 ``` 确认已经修改/usr/local/oushu/conf/common/core-site.xml文件中的代理用户,修改需要并同步至所有HDFS节点并重启NameNode,DataNode 登录hdfs1 ```shell ssh hdfs1 su root ``` 修改/usr/local/oushu/conf/common的core-site.xml文件,设置Hive的代理用户 ``` hadoop.proxyuser.hdfs.groups * hadoop.proxyuser.hdfs.hosts * hadoop.proxyuser.root.groups * hadoop.proxyuser.root.hosts * hadoop.proxyuser.hive.groups * hadoop.proxyuser.hive.hosts * hadoop.proxyuser.HTTP.groups * hadoop.proxyuser.HTTP.hosts * hadoop.proxyuser.hive.users * hadoop.proxyuser.hdfs.users * hadoop.proxyuser.root.users * ``` 同步配置 ```shell lava scp -r -f ${HOME}/hdfshost /usr/local/oushu/conf/common/* =:/usr/local/oushu/conf/common/ lava scp -r -f ${HOME}/yarnhost /usr/local/oushu/conf/common/core-site.xml =:/usr/local/oushu/conf/common/ lava scp -r -f ${HOME}/hivehost /usr/local/oushu/conf/common/* =:/usr/local/oushu/conf/hive/ ``` 同步Hive的Ranger配置,并执行初始化配置脚本 ```shell lava scp -r -f ${HOME}/hivehost /usr/local/oushu/ranger-hive-plugin_2.3.0/install.properties =:/usr/local/oushu/ranger-hive-plugin_2.3.0/ lava ssh -f ${HOME}/hivehost -e '/usr/local/oushu/ranger-hive-plugin_2.3.0/enable-hive-plugin.sh' ``` 执行完初始化脚本后,看到如下信息说明成功,并按照要求重启服务。 ```shell Ranger Plugin for hive has been enabled. Please restart hive to ensure that changes are effective. ``` 重新启动Hive ```shell # Hive进程只能采用kill -9 pid的方式关闭 su hive jps * 3432 RunJar * 2987 RunJar kill -9 3432 kill -9 2987 exit # 上述命令需要登录到hive2机器再执行一遍 ssh hive2 su - hive jps * 3433 RunJar * 2988 RunJar kill -9 3433 kill -9 2988 exit # 回到hive1启动Hive ssh hive1 su hive source /usr/local/oushu/conf/hive/hive-env.sh lava ssh -f ${HOME}/hivehost -e 'nohup hive --service metastore >/dev/null 2>&1 &' lava ssh -f ${HOME}/hivehost -e 'nohup hive --service hiveserver2 >/dev/null 2>&1 &' ``` #### 在rangerUI 上配置用户权限策略 ##### 创建`Hive Service`服务 - 登陆rangerUI http://ranger1:6080,点击➕号添加`Hive Service`,注意选择的标签是"HADOOP SQL" ![image](./images/hive-ranger-1.png) - 填写服务名,注意需要和`install.properties`文件里的`REPOSITORY_NAME`名称保持一致 ![image](./images/hive-ranger-2.png) - 用户名、密码自定义,写入Hive的链接方式,若kerberos开启了kerberos认证,则填写相应的keytab文件,否则使用默认配置 ![image](./images/hive-ranger-3.png) - 运行测试查看是否配置正确,正确后点击添加保存。 ![image](./images/hive-ranger-4.png) ##### 创建访问测量 - 找到刚刚创建的服务,点击名称 ![image](./images/hive-policy-1.png) - 点击'Add New Policy'按钮 ![image](./images/hive-policy-2.png) - 设置访问策略,使得hive用户在't1'下的只有读权限,同时,要确保 recursive 滑块处于开启状态 ![image](./images/hive-policy-3.png) ![image](./images/hive-policy-5.png) ![image](./images/hive-policy-8.png) ![image](./images/hive-policy-4.png) - 查看刚刚设置 ![image](./images/hive-policy-6.png) ##### Ranger + Kerberos 注意项 当开启Kerberos配置时,需要对Ranger服务也开启Kerberos,同时在配置Hive repo时,加入参数如下: ![image](./images/hive-ranger-kerberos.png) 参数值为配置的Kerberos实体用户名。 ##### 检查效果 登陆hive1机器,使用hive用户访问 ```shell sudo su hive source /usr/local/oushu/conf/hive/hive-env.sh /usr/local/oushu/hive/bin/beeline !connect jdbc:hive2://oushu162509m1-4424-4424-1:10000 ``` 出下如下信息,证明生效(策略配置完可能需要一分钟生效,可以过会再试) ```shell > use test; > selcet * from t1; OK +---------+ |test.id | +---------+ | 1 | +---------+ 1 row(s) selected(0.18 seconds) > inset into t1 values(1); Permission denied: user [hive] does not have [write] privilega on [t1] ``` ## Hive on Tez ### 前提 完成前文Hive部署,可以不开启Kerberos认证。 ### 安装 Tez由两个包组成,分别为:tez-minimal.tar和tez.tar,通过tar下载Tez到本地。 下载Tez安装包: ```shell sudo su root lava ssh -f ${HOME}/hivehost -e 'mkdir -p /usr/local/oushu/tez' lava ssh -f ${HOME}/hivehost -e 'wget $获取两个tarball的 url -O /usr/local/oushu/tez/tez-0.10.1-minimal.tar.gz' lava ssh -f ${HOME}/hivehost -e 'wget $获取两个tarball的 url -O /usr/local/oushu/tez/tez-0.10.1.tar.gz' ``` 本地解压tez-0.10.1.tar.gz ```sh lava ssh -f ${HOME}/hivehost -e 'tar -zxvf /usr/local/oushu/tez/tez-0.10.1.tar.gz -C /usr/local/oushu/tez' lava ssh -f ${HOME}/hivehost -e 'chown -R hive:hadoop /usr/local/oushu/tez' ``` ### 配置 在/usr/local/oushu/conf/hive/下创建的tez配置文件tez-site.xml并修改: ```xml tez.lib.uris /apps/tez/tez-0.10.1-minimal.tar.gz tez.container.max.java.heap.fraction 0.2 tez.use.cluster.hadoop-libs true tez.am.am-rm.heartbeat.interval-ms.max 250 tez.am.container.idle.release-timeout-max.millis 20000 tez.am.container.idle.release-timeout-min.millis 10000 tez.am.container.reuse.enabled true tez.am.container.reuse.locality.delay-allocation-millis 250 tez.am.container.reuse.non-local-fallback.enabled false tez.am.container.reuse.rack-fallback.enabled true tez.am.java.opts -server -Xmx1024m -Djava.net.preferIPv4Stack=true tez.am.launch.cluster-default.cmd-opts -server -Djava.net.preferIPv4Stack=true tez.am.launch.cmd-opts -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB tez.am.launch.env LD_LIBRARY_PATH=/usr/local/oushu/hdfs/lib/native tez.am.log.level INFO tez.am.max.app.attempts 2 tez.am.maxtaskfailures.per.node 10 tez.am.resource.memory.mb 2048 tez.am.resource.cpu.vcores 2 tez.am.view-acls tez.counters.max 10000 tez.counters.max.groups 3000 tez.generate.debug.artifacts false tez.grouping.max-size 1073741824 tez.grouping.min-size 16777216 tez.grouping.split-waves 1.7 tez.queue.name default tez.runtime.compress true tez.runtime.compress.codec org.apache.hadoop.io.compress.SnappyCodec tez.runtime.convert.user-payload.to.history-text false tez.runtime.io.sort.mb 512 tez.runtime.optimize.local.fetch true tez.runtime.pipelined.sorter.sort.threads 1 tez.runtime.shuffle.memory.limit.percent 0.25 tez.runtime.sorter.class PIPELINED tez.runtime.unordered.output.buffer.size-mb 76 tez.session.am.dag.submit.timeout.secs 600 tez.session.client.timeout.secs -1 tez.shuffle-vertex-manager.max-src-fraction 0.4 tez.shuffle-vertex-manager.min-src-fraction 0.2 tez.staging-dir /tmp/${user.name}/staging tez.task.am.heartbeat.counter.interval-ms.max 4000 tez.task.generate.counters.per.io true tez.task.get-task.sleep.interval-ms.max 200 tez.task.launch.cluster-default.cmd-opts -server -Djava.net.preferIPv4Stack=true tez.task.launch.cmd-opts -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB tez.task.launch.env LD_LIBRARY_PATH=/usr/local/oushu/hdfs/lib/native tez.task.max-events-per-heartbeat 500 tez.task.resource.memory.mb 1024 tez.use.cluster.hadoop-libs true yarn.timeline-service.enabled false hive.tez.container.size 2048 ``` 同步tez到HDFS机器: ```shell lava scp -r -f hdfs1 /usr/local/oushu/tez/tez-0.10.1-minimal.tar.gz =:/usr/local/oushu/hdfs/ # 登录hdfs1机器并上传到HDFS ssh hdfs1 su hdfs hdfs dfs -mkdir -p /apps/tez hdfs dfs -copyFromLocal /usr/local/oushu/hdfs/tez-0.10.1-minimal.tar.gz /apps/tez # 退出hdfs用户并回到hive1 exit exit ``` 修改 /usr/local/oushu/conf/hive/hive-site.xml使Hive用Tez ```shell hive.execution.engine tez ``` 修改使用YARN引擎 修改/usr/local/oushu/conf/common下的mapred-site.xml ```shell mapreduce.framework.name yarn-tez ``` 修改环境变量 ```sh export TEZ_CONF_DIR=/usr/local/oushu/conf/hive/tez-site.xml export TEZ_JARS=/usr/local/oushu/tez/ export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/* # 将上方的环境变量信息追加到下方文件 # 如果Hive和HDFS为分离部署,需要将tez-site.xml复制到HDFS配置目录下 /usr/local/oushu/conf/hive/hive-env.sh /usr/local/oushu/conf/common/hadoop-env.sh ``` ### 启动 ```shell # 重新启动Hive # Hive进程只能采用kill -9 pid的方式关闭 su hive jps * 3432 RunJar * 2987 RunJar kill -9 3432 kill -9 2987 exit # 上述命令需要登录到hive2机器再执行一遍 ssh hive2 su - hive jps * 3433 RunJar * 2988 RunJar kill -9 3433 kill -9 2988 exit # 回到hive1启动Hive ssh hive1 su hive source /usr/local/oushu/conf/hive/hive-env.sh lava ssh -f ${HOME}/hivehost -e 'nohup hive --service metastore >/dev/null 2>&1 &' lava ssh -f ${HOME}/hivehost -e 'nohup hive --service hiveserver2 >/dev/null 2>&1 &' ``` ### 检查状态 ```shell # 进入hive客户端 hive # 测试是否可用 hive:>create database td_test; OK Time taken:0.201 seconds hive:>use td_test; OK hive:>create table test(id int); OK Time taken:0.234 seconds hive:>insert into test values(1),(2); OK Time taken:14.73 seconds, Fetch:1 row(s) hive:>select * from test; Query ID = hive_20221110150743_4155afab-4bfa-4e8a-acb0-90c8c50ecfb5 Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1478229439699_0007) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 2 2 0 0 0 0 Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 10.19 s -------------------------------------------------------------------------------- OK 1 oushu 2 hive Time taken: 11.48 seconds, Fetched: 2 row(s) # 上边表格出现表明使用率Tez引擎 ``` ## 安装Hive Client安装 如果需要在并没有部署Hive的机器使用Hive命令,需要安装Hive Client端和HDFS Client端 Hive Client地址假定为`hive3,hive4,hive5` ### 准备 在hive1机器创建hiveclienthost ```shell su root touch ${HOME}/hiveclienthost ``` 添加下面主机名到hiveclienthost: ``` sh hive3 hive4 hive5 ``` 交换公钥,以便ssh免密码登陆和分发配置文件 ```sh # 和集群内其他机器交换公钥 lava ssh-exkeys -f ${HOME}/hiveclienthost -p ******** # 将repo文件分发给集群内其他机器 lava scp -f ${HOME}/hiveclienthost /etc/yum.repos.d/oushu.repo =:/etc/yum.repos.d ``` ### 安装 ```shell lava ssh -f ${HOME}/hiveclienthost -e 'yum install -y hive' lava ssh -f ${HOME}/hiveclienthost -e 'yum install -y hdfs mapreduce yarn' lava ssh -f ${HOME}/hiveclienthost -e 'chown -R hdfs:hadoop /usr/local/oushu/conf/common/' lava scp -r -f ${HOME}/hiveclienthost /usr/local/oushu/conf/common/* =:/usr/local/oushu/conf/common/ lava ssh -f ${HOME}/hiveclienthost -e 'chown -R hive:hadoop /usr/local/oushu/conf/hive/' lava scp -r -f ${HOME}/hiveclienthost /usr/local/oushu/conf/hive/* =:/usr/local/oushu/conf/hive/ lava ssh -f ${HOME}/hiveclienthost -e 'sudo mkdir -p /data1/hdfs/hive/' lava ssh -f ${HOME}/hiveclienthost -e 'chown -R hive:hadoop /data1/hdfs/hive/' ``` ### 检查 ```shell ssh hive3 su hive # 进入hive客户端 hive # 测试是否可用 hive:>create database td_test; OK Time taken:0.201 seconds # 有返回即证明客户端生效 ```