# 命令行部署
---
## Hive HA部署
### 前提
Hive需要依赖HDFS、YARN、ZooKeeper集群,元数据存储使用PG。
如果 Hive 以 HA+Kerberos模式部署,则需要ZooKeeper开启Kerberos认证。
ZooKeeper安装部署请参考:[ZooKeeper 安装](../zookeeper/installation-zookeeper.rst)。
ZooKeeper服务地址假定为`zookeeper1:2181,zookeeper2:2181,zookeeper3:2181`
YARN安装部署请参考:[YARN 安装](../yarn/installation-yarn.rst)。
YARN服务地址假定为`yarn1:8090,yarn2:8090,yarn3:8090`
HDFS安装部署请参考:[HDFS 安装](../hdfs/installation-hdfs.rst)。
HDFS服务地址假定为`hdfs1:9000,hdfs2:9000,hdfs3:9000`
Hive的安装需要依赖外部数据库做元数据存储,默认使用Skylab平台本身的Postgres数据库。
Postgres地址假定为`PG1`
若需要配置Kerberos认证,那么需要提前部署好KDC服务:[Kerberos 安装](../kerberos/installation-kerberos.rst)。
KDC服务地址假定为`kdc1`
如Hive与HDFS/YARN集群分离部署,则需要在所以Hive机器上安装HDFS Client并同步HDFS配置文件。
若开启Ranger认证,Ranger安装部署请参考:[Ranger 安装](../ranger/ranger-start-installation-cli.md)。
Ranger 服务地址假定为`ranger1`
#### 配置yum源并安装lava
登录hive1机器,然后切换到root用户
```shell
ssh hive1
su root
```
配置yum源,安装lava命令行管理工具
```sh
# 从yum源所在机器(假设为192.168.1.10)获取repo文件
scp root@192.168.1.10:/etc/yum.repos.d/oushu.repo /etc/yum.repos.d/oushu.repo
# 追加yum源所在机器信息到/etc/hosts文件
# 安装lava命令行管理工具
yum clean all
yum makecache
yum install -y lava
```
创建hivehost:
```shell
touch ${HOME}/hivehost
```
配置hivehost内容为Hive有依赖的所有的节点hostname:
```
hive1
hive2
```
修改权限:
```shell
chmod 777 ${HOME}/hivehost
```
在首台机器上和集群内其他节点交换公钥,以便ssh免密码登陆和分发配置文件
```sh
# 和集群内其他机器交换公钥
lava ssh-exkeys -f ${HOME}/hivehost -p ********
# 将repo文件分发给集群内其他机器
lava scp -f ${HOME}/hivehost /etc/yum.repos.d/oushu.repo =:/etc/yum.repos.d
```
### 安装
#### 准备
```shell
lava ssh -f ${HOME}/hivehost -e 'yum install -y hive'
# 如果Hive为分离部署,则需要安装HDFS客户端(可选)
lava ssh -f ${HOME}/hivehost -e 'yum install -y hdfs'
```
创建Hive路径,赋予hive用户权限
```shell
lava ssh -f ${HOME}/hivehost -e 'mkdir -p /data1/hdfs/hive/hdfs'
lava ssh -f ${HOME}/hivehost -e 'chown -R hive:hadoop /data1/hdfs/hive'
lava ssh -f ${HOME}/hivehost -e 'mkdir -p /etc/security/keytabs/'
```
其中参数:hive.metastore.warehouse.dir指定的路径需要在**HDFS**中创建。
(可选:当使用Kerberos时,需先在kdc1配置Hive的principal,同步keytab后再进行路径创建,参见下文*Hive的KDC认证*)
```shell
hdfs dfs -mkdir -p /usr/hive/warehouse
hdfs dfs -mkdir -p /hive/tmp
hdfs dfs -mkdir -p /usr/hive/log
hdfs dfs -chmod -R 755 /usr/hive
hdfs dfs -chmod -R 755 /hive/tmp
```
修改存储在 /usr/local/oushu/conf/hive 的hive-env.sh文件
```shell
export JAVA_HOME=/usr/java/default/jre
```
##### Hive的KDC认证(可选)
如果开启Kerberos,则需要在所有Hive节点安装Kerberos客户端。
```shell
lava ssh -f ${HOME}/hivehost -e "yum install -y krb5-libs krb5-workstation"
```
创建principal和keytab
```shell
ssh kdc1
kadmin.local
```
为Hive进行KDC认证
```sh
# 为hive角色生成实例
addprinc -randkey hive/hive1@KDCSERVER.OUSHU.COM
addprinc -randkey hive/hive2@KDCSERVER.OUSHU.COM
addprinc -randkey HTTP/hive1@KDCSERVER.OUSHU.COM
addprinc -randkey HTTP/hive2@KDCSERVER.OUSHU.COM
addprinc -randkey hive@KDCSERVER.OUSHU.COM
# 为每个实例生成keytab文件
ktadd -k /etc/security/keytabs/hive.keytab hive/hive1@KDCSERVER.OUSHU.COM
ktadd -k /etc/security/keytabs/hive.keytab hive/hive2@KDCSERVER.OUSHU.COM
ktadd -k /etc/security/keytabs/hive.keytab hive@KDCSERVER.OUSHU.COM
ktadd -norandkey -k /etc/security/keytabs/hive.keytab HTTP/hive1@KDCSERVER.OUSHU.COM
ktadd -norandkey -k /etc/security/keytabs/hive.keytab HTTP/hive2@KDCSERVER.OUSHU.COM
# 退出
quit
```
在hive1分发并修改keytab文件的权限
```shell
ssh hive1
scp root@kdc1:/etc/security/keytabs/hive.keytab /etc/security/keytabs/hive.keytab
scp root@kdc1:/etc/security/keytabs/hdfs.keytab /etc/security/keytabs/hdfs.keytab
scp root@kdc1:/etc/security/keytabs/yarn.keytab /etc/security/keytabs/yarn.keytab
scp root@kdc1:/etc/krb5.conf /etc/krb5.conf
lava scp -r -f ${HOME}/hivehost /etc/security/keytabs/hive.keytab =:/etc/security/keytabs/hive.keytab
lava scp -r -f ${HOME}/hivehost /etc/security/keytabs/hdfs.keytab =:/etc/security/keytabs/hdfs.keytab
lava scp -r -f ${HOME}/hivehost /etc/security/keytabs/yarn.keytab =:/etc/security/keytabs/yarn.keytab
lava scp -r -f ${HOME}/hivehost /etc/krb5.conf =:/etc/krb5.conf
lava ssh -f ${HOME}/hivehost -e 'chown hive /etc/security/keytabs/hive.keytab'
lava ssh -f ${HOME}/hivehost -e 'chmod 400 /etc/security/keytabs/hive.keytab'
```
### 配置
#### 元数据库配置
修改在/usr/local/oushu/conf/hive/下的hive-site.xml,使Hive启用PG
```xml
javax.jdo.option.ConnectionDriverName
org.postgresql.Driver
JDBC驱动名
hive.metastore.db.type
postgres
javax.jdo.option.ConnectionURL
jdbc:postgresql://datanode01:3306/hive_db
JDBC连接名
javax.jdo.option.ConnectionUserName
hive
连接metastore数据库的用户名(pg创建)
javax.jdo.option.ConnectionPassword
{此处须配置Skylab PG的强密码}
连接metastore数据库的密码(pg创建)
hive.metastore.schema.verification
false
强制metastore schema的版本一致性
```
#### Hive基础配置
修改/usr/local/oushu/conf/hive的hive-site.xml文件
```xml
hive.exec.local.scratchdir
/data1/hdfs/hive/hdfs
hive的本地临时目录,用来存储不同阶段的map/reduce的执行计划
hive.downloaded.resources.dir
/data1/hdfs/hive/${hive.session.id}_resources
hive下载的本地临时目录
hive.querylog.location
/data1/hdfs/hive/hdfs
hive运行时结构化日志路径
hive.server2.logging.operation.log.location
/data1/hdfs/hive/hdfs/operation_logs
日志开启时的,操作日志路径
hive.metastore.warehouse.dir
/usr/hive/warehouse
Hive数据仓库在HDFS中的路径
hive.metastore.warehouse.external.dir
hive.server2.support.dynamic.service.discovery
true
hive.server2.zookeeper.namespace
hiveserver2_zk
hive.zookeeper.quorum
zookeeper1:2181,zookeeper2:2181,zookeeper3:2181
hive.zookeeper.client.port
2181
hive.metastore.uris
thrift://hive1:9083,thrift://hive2:9083
远程metastore的 Thrift URI,以供metastore客户端连接metastore服务端
```
分发配置到hive2
```shell
lava scp -r -f ${HOME}/hivehost /usr/local/oushu/conf/hive/* =:/usr/local/oushu/conf/hive/
```
登录到hive2,并修改/usr/local/oushu/conf/hive/下的hive-site.xml
```shell
hive.server2.thrift.bind.host
hive2
```
##### Hive调优(可选)
一般推荐Hive使用默认参数运行,如果期待调优,建议优先调整Hive使用的资源,具体参考YARN章节[YARN 安装](../yarn/installation-yarn.rst)下的"配置调优(可选)"部分。
#### Kerberos配置(可选)
在hive1节点下
修改在/usr/local/oushu/conf/hive的hive-env.sh文件
```shell
export CLIENT_JVMFLAGS="-Djava.security.auth.login.config=/usr/local/oushu/conf/zookeeper/client-jaas.conf"
```
如果本机没有ZooKeeper部署,需要本地同步ZooKeeper的keytab并创建client-jaas.conf文件,具体参考 [ZooKeeper 安装](../zookeeper/installation-zookeeper.rst)。
如果Hive 部署模式为HA + Kerberos模式,需要先在Zookeeper客户端创建Hive路径
```shell
sudo -u zookeeper /usr/local/oushu/zookeeper/bin/zkCil.sh
[zk: localhost:2181(CONNECTED) 1] create /hiveserver2_zk
```
修改在/usr/local/oushu/conf/hive的hive-site.xml文件
```
hive.server2.enable.doAs
true
hive.server2.authentication
KERBEROS
hive.server2.authentication.kerberos.principal
hive/_HOST@KDCSERVER.OUSHU.COM
hive.server2.authentication.kerberos.keytab
/etc/security/keytabs/hive.keytab
hive.metastore.sasl.enabled
true
hive.metastore.kerberos.keytab.file
/etc/security/keytabs/hive.keytab
hive.metastore.kerberos.principal
hive/_HOST@KDCSERVER.OUSHU.COM
```
同步Hive的Kerberos配置
```shell
lava scp -r -f ${HOME}/hivehost /usr/local/oushu/conf/hive/* =:/usr/local/oushu/conf/hive/
lava ssh -f ${HOME}/hivehost -e 'mkdir -p /usr/local/oushu/conf/zookeeper/'
lava ssh -f ${HOME}/hivehost -e 'chmod -R 755 /usr/local/oushu/conf/zookeeper/'
lava ssh -f ${HOME}/hivehost -e 'chown -R hive:hadoop /usr/local/oushu/conf/zookeeper/'
lava scp -r -f ${HOME}/hivehost /usr/local/oushu/conf/zookeeper/client-jaas.conf =:/usr/local/oushu/conf/zookeeper/
```
登录hdfs1
```shell
ssh hdfs1
su root
```
修改/usr/local/oushu/conf/common的core-site.xml文件,设置Hive的代理用户,修改后需要重启NameNode,DataNode
```
hadoop.proxyuser.hdfs.groups
*
hadoop.proxyuser.hdfs.hosts
*
hadoop.proxyuser.root.groups
*
hadoop.proxyuser.root.hosts
*
hadoop.proxyuser.hive.groups
*
hadoop.proxyuser.hive.hosts
*
hadoop.proxyuser.HTTP.groups
*
hadoop.proxyuser.HTTP.hosts
*
hadoop.proxyuser.hive.users
*
hadoop.proxyuser.hdfs.users
*
hadoop.proxyuser.root.users
*
```
在hdfs1上创建hivehost:
```shell
touch ${HOME}/hivehost
```
配置hivehost内容为Hive有依赖的所有的节点hostname:
```
hive1
hive2
```
在hdfs1上创建yarnhost:
```shell
touch ${HOME}/yarnhost
```
配置yarnhost内容为Hive有依赖的所有的节点hostname:
```
yarn1
yarn2
yarn3
```
在hdfs1机器上和集群节点交换公钥,以便ssh免密码登陆和分发配置文件
```sh
# 和集群内其他机器交换公钥
lava ssh-exkeys -f ${HOME}/hivehost -p ********
lava ssh-exkeys -f ${HOME}/yarnhost -p ********
# 将repo文件分发给集群内其他机器
lava scp -f ${HOME}/hivehost /etc/yum.repos.d/oushu.repo =:/etc/yum.repos.d
lava scp -f ${HOME}/yarnhost /etc/yum.repos.d/oushu.repo =:/etc/yum.repos.d
```
修改完HDFS配置文件后需要同步到HDFS所有节点,并重启HDFS集群
如果没有对core-site等HDFS、YARN相关配置文件进行修改,则无需重启集群服务使参数生效。
```shell
lava scp -r -f ${HOME}/hdfshost /usr/local/oushu/conf/common/* =:/usr/local/oushu/conf/common/
lava scp -r -f ${HOME}/yarnhost /usr/local/oushu/conf/common/core-site.xml =:/usr/local/oushu/conf/common/
lava scp -r -f ${HOME}/hivehost /usr/local/oushu/conf/common/* =:/usr/local/oushu/conf/hive/
# 重启HDFS集群
lava ssh -f ${HOME}/nnhostfile -e 'sudo -E -u hdfs hdfs --daemon stop namenode'
lava ssh -f ${HOME}/dnhostfile -e 'sudo -E -u hdfs hdfs --daemon stop datanode'
lava ssh -f ${HOME}/jnhostfile -e 'sudo -E -u hdfs hdfs --daemon stop journalnode'
lava ssh -f ${HOME}/nnhostfile -e 'sudo -E -u hdfs hdfs --daemon start namenode'
lava ssh -f ${HOME}/dnhostfile -e 'sudo -E -u hdfs hdfs --daemon start datanode'
lava ssh -f ${HOME}/jnhostfile -e 'sudo -E -u hdfs hdfs --daemon start journalnode'
# 重启YARN集群
lava ssh -f ${HOME}/yarnhost -e 'sudo -E -u yarn yarn --daemon stop nodemanager'
lava ssh -f ${HOME}/yarnhost -e 'sudo -E -u yarn yarn --daemon stop resourcemanager'
lava ssh -f ${HOME}/yarnhost -e 'sudo -E -u yarn yarn --daemon start nodemanager'
lava ssh -f ${HOME}/yarnhost -e 'sudo -E -u yarn yarn --daemon start resourcemanager'
```
### 启动
#### 元数据库
在hive1节点使用root用户执行下面命令创建Hive元数据库
```shell
ssh PG1
psql -d postgres -h hive1 -p 4432 -U root -Atc "create database hive_db;"
```
初始化Hive元数据
```shell
ssh hive1
source /usr/local/oushu/conf/hive/hive-env.sh
/usr/local/oushu/hive/bin/schematool -dbType postgres -initSchema
```
#### Hive启动
如果是kerberos+HA模式启动Hive,需要先在Zookeeper上创建HA需要的路径,防止Hive自身启动时使用带Kerberos权限的用户创建路径造成HA注册失败。
其中&host+port和&hive.server2.zookeeper.namespace分别为Zookeeper集群任一节点地址端口和hive-site中设置的HA路径。
```shell
su hive
/usr/local/oushu/hive/bin/zkCli.sh -server &host+port create /&hive.server2.zookeeper.namespace
```
启动Hive
```shell
su hive
lava ssh -f /root/hivehost -e 'nohup hive --service metastore >/dev/null 2>&1 &'
lava ssh -f /root/hivehost -e 'nohup hive --service hiveserver2 >/dev/null 2>&1 &'
```
### 检查状态
登录zookeeper1机器
```shell
ssh zookeeper1
su zookeeper
# 进入zookeeper客户端并检查HA是否注册
/usr/local/oushu/zookeeper/bin/zkCli.sh
[zk: localhost:2181(CONNECTED) 1] ls /hiveserver2_zk
[serverUri=VM-128-22-centos:10000;version=3.1.3;sequence=0000000001, serverUri=vm-128-22-centos:10000;version=3.1.3;sequence=0000000000]
```
执行sql测试hive是否可用
```shell
# 通过hive命令进入客户端
hive
hive:>create database td_test;
OK
Time taken:0.201 seconds
hive:>use td_test;
OK
hive:>create table test(id int);
OK
Time taken:0.234 seconds
hive:>insert into test values(1),(2);
OK
Time taken:14.73 seconds, Fetch:1 row(s)
hive:>select * from test;
OK
1
2
Time taken: 11.48 seconds, Fetched: 2 row(s)
```
### 注册到Skylab(可选)
Kerberos将要安装的机器需要通过机器管理添加到skylab中,如果您尚未添加,请参考[注册机器](../start/install-lava.md)。
在hive1上修改/usr/local/oushu/lava/conf配置`server.json`,替换localhost为skylab的服务器ip,具体skylab的基础服务lava安装步骤请参考:[lava安装](../start/start-installation.rst)。
然后创建`~/hive.json`文件,文件内容参考如下:
```json
{
"data": {
"name": "HiveCluster",
"group_roles": [
{
"role": "hive.metastore",
"cluster_name": "metastore-id",
"group_name": "metastore",
"machines": [
{
"id": 1,
"name": "metastore1",
"subnet": "lava",
"data_ip": "192.168.1.11",
"manage_ip": "",
"assist_port": 1622,
"ssh_port": 22
},{
"id": 2,
"name": "metastore2",
"subnet": "lava",
"data_ip": "192.168.1.11",
"manage_ip": "",
"assist_port": 1622,
"ssh_port": 22
}
]
},
{
"role": "hive.hiveservice2",
"cluster_name": "hiveservice2-id",
"group_name": "hiveservice2",
"machines": [
{
"id": 1,
"name": "hiveservice2-1",
"subnet": "lava",
"data_ip": "192.168.1.11",
"manage_ip": "",
"assist_port": 1622,
"ssh_port": 22
},{
"id": 2,
"name": "hiveservice2-2",
"subnet": "lava",
"data_ip": "192.168.1.11",
"manage_ip": "",
"assist_port": 1622,
"ssh_port": 22
}
]
}
],
"config": {
"hive-env.sh": [
{
"key": "HIVE_HOME",
"value": "/usr/local/oushu/hive"
},
{
"key": "HIVE_CONF_DIR",
"value": "/usr/local/oushu/conf/hive"
},
{
"key": "HIVE_LOG_DIR",
"value": "/usr/local/oushu/log/hive"
},
{
"key": "HADOOP_CONF_DIR",
"value": "/usr/local/oushu/conf/hive"
}
],
"hive-site.xml": [
{
"key": "hive.exec.local.scratchdir",
"value": "/data1/hdfs/hive/hdfs"
},
{
"key": "hive.querylog.location",
"value": "/data1/hdfs/hive/hdfs"
},
{
"key": "hive.metastore.warehouse.dir",
"value": "/usr/hive/warehouse"
},
{
"key": "javax.jdo.option.ConnectionDriverName",
"value": "org.postgresql.Driver"
},
{
"key": "javax.jdo.option.ConnectionURL",
"value": "jdbc:postgresql://datanode01:3306/hive_db"
},
{
"key": "hive.server2.support.dynamic.service.discovery",
"value": "true"
},
{
"key": "hive.server2.zookeeper.namespace",
"value": "2181"
},{
"key": "hive.zookeeper.client.port",
"value": "2181"
},{
"key": "hive.zookeeper.quorum",
"value": "zookeeper1:2181,zookeeper2:2181,zookeeper3:2181"
},{
"key": "hive.metastore.uris",
"value": "thrift://hive1:9083,thrift://hive2:9083"
}
]
}
}
}
```
上述配置文件中,需要根据实际情况修改machines数组中的机器信息,在平台基础组件lava所安装的机器执行:
```
psql lavaadmin -p 4432 -U oushu -c "select m.id,m.name,s.name as subnet,m.private_ip as data_ip,m.public_ip as manage_ip,m.assist_port,m.ssh_port from machine as m,subnet as s where m.subnet_id=s.id;"
```
获取到所需的机器信息,根据服务角色对应的节点,将机器信息添加到machines数组中。
例如hive1对应的Hive MetaStore角色,hive1的机器信息需要备添加到hive.metastore角色对应的machines数组中。
调用lava命令注册集群:
```
lava login -u oushu -p ******** -T {租户id}
lava onprem-register service -s Hive -f ~/hive.json
```
如果返回值为:
```
Add service by self success
```
则表示注册成功,如果有错误信息,请根据错误信息处理。
同时,从页面登录后,在自动部署模块对应服务中可以查看到新添加的集群。
### Hive集成Ranger认证(可选)
#### Ranger安装
如果开启Ranger,则需要在所有Hive节点安装Ranger客户端。
```shell
lava ssh -f ${HOME}/hivehost -e "yum install -y ranger-hive-plugin"
lava ssh -f ${HOME}/hivehost -e "ln -s /usr/local/oushu/conf/hive /usr/local/oushu/hive/conf"
```
#### Ranger配置
在hive1节点下修改配置文件/usr/local/oushu/ranger-hive-plugin_2.3.0/install.properties
```shell
POLICY_MGR_URL=http://ranger1:6080
REPOSITORY_NAME=hivedev
COMPONENT_INSTALL_DIR_NAME=/usr/local/oushu/hive
```
确认配置文件/usr/local/oushu/conf/hive/hive-site.xml中配置了以下参数
```shell
hive.metastore.uris
thrift://hive1:9083,thrift://hive2:9083
远程metastore的 Thrift URI,以供metastore客户端连接metastore服务端
```
确认已经修改/usr/local/oushu/conf/common/core-site.xml文件中的代理用户,修改需要并同步至所有HDFS节点并重启NameNode,DataNode
登录hdfs1
```shell
ssh hdfs1
su root
```
修改/usr/local/oushu/conf/common的core-site.xml文件,设置Hive的代理用户
```
hadoop.proxyuser.hdfs.groups
*
hadoop.proxyuser.hdfs.hosts
*
hadoop.proxyuser.root.groups
*
hadoop.proxyuser.root.hosts
*
hadoop.proxyuser.hive.groups
*
hadoop.proxyuser.hive.hosts
*
hadoop.proxyuser.HTTP.groups
*
hadoop.proxyuser.HTTP.hosts
*
hadoop.proxyuser.hive.users
*
hadoop.proxyuser.hdfs.users
*
hadoop.proxyuser.root.users
*
```
同步配置
```shell
lava scp -r -f ${HOME}/hdfshost /usr/local/oushu/conf/common/* =:/usr/local/oushu/conf/common/
lava scp -r -f ${HOME}/yarnhost /usr/local/oushu/conf/common/core-site.xml =:/usr/local/oushu/conf/common/
lava scp -r -f ${HOME}/hivehost /usr/local/oushu/conf/common/* =:/usr/local/oushu/conf/hive/
```
同步Hive的Ranger配置,并执行初始化配置脚本
```shell
lava scp -r -f ${HOME}/hivehost /usr/local/oushu/ranger-hive-plugin_2.3.0/install.properties =:/usr/local/oushu/ranger-hive-plugin_2.3.0/
lava ssh -f ${HOME}/hivehost -e '/usr/local/oushu/ranger-hive-plugin_2.3.0/enable-hive-plugin.sh'
```
执行完初始化脚本后,看到如下信息说明成功,并按照要求重启服务。
```shell
Ranger Plugin for hive has been enabled. Please restart hive to ensure that changes are effective.
```
重新启动Hive
```shell
# Hive进程只能采用kill -9 pid的方式关闭
su hive
jps
* 3432 RunJar
* 2987 RunJar
kill -9 3432
kill -9 2987
exit
# 上述命令需要登录到hive2机器再执行一遍
ssh hive2
su - hive
jps
* 3433 RunJar
* 2988 RunJar
kill -9 3433
kill -9 2988
exit
# 回到hive1启动Hive
ssh hive1
su hive
source /usr/local/oushu/conf/hive/hive-env.sh
lava ssh -f ${HOME}/hivehost -e 'nohup hive --service metastore >/dev/null 2>&1 &'
lava ssh -f ${HOME}/hivehost -e 'nohup hive --service hiveserver2 >/dev/null 2>&1 &'
```
#### 在rangerUI 上配置用户权限策略
##### 创建`Hive Service`服务
- 登陆rangerUI http://ranger1:6080,点击➕号添加`Hive Service`,注意选择的标签是"HADOOP SQL"

- 填写服务名,注意需要和`install.properties`文件里的`REPOSITORY_NAME`名称保持一致

- 用户名、密码自定义,写入Hive的链接方式,若kerberos开启了kerberos认证,则填写相应的keytab文件,否则使用默认配置

- 运行测试查看是否配置正确,正确后点击添加保存。

##### 创建访问测量
- 找到刚刚创建的服务,点击名称

- 点击'Add New Policy'按钮

- 设置访问策略,使得hive用户在't1'下的只有读权限,同时,要确保 recursive 滑块处于开启状态




- 查看刚刚设置

##### Ranger + Kerberos 注意项
当开启Kerberos配置时,需要对Ranger服务也开启Kerberos,同时在配置Hive repo时,加入参数如下:

参数值为配置的Kerberos实体用户名。
##### 检查效果
登陆hive1机器,使用hive用户访问
```shell
sudo su hive
source /usr/local/oushu/conf/hive/hive-env.sh
/usr/local/oushu/hive/bin/beeline
!connect jdbc:hive2://oushu162509m1-4424-4424-1:10000
```
出下如下信息,证明生效(策略配置完可能需要一分钟生效,可以过会再试)
```shell
> use test;
> selcet * from t1;
OK
+---------+
|test.id |
+---------+
| 1 |
+---------+
1 row(s) selected(0.18 seconds)
> inset into t1 values(1);
Permission denied: user [hive] does not have [write] privilega on [t1]
```
## Hive on Tez
### 前提
完成前文Hive部署,可以不开启Kerberos认证。
### 安装
Tez由两个包组成,分别为:tez-minimal.tar和tez.tar,通过tar下载Tez到本地。
下载Tez安装包:
```shell
sudo su root
lava ssh -f ${HOME}/hivehost -e 'mkdir -p /usr/local/oushu/tez'
lava ssh -f ${HOME}/hivehost -e 'wget $获取两个tarball的 url -O /usr/local/oushu/tez/tez-0.10.1-minimal.tar.gz'
lava ssh -f ${HOME}/hivehost -e 'wget $获取两个tarball的 url -O /usr/local/oushu/tez/tez-0.10.1.tar.gz'
```
本地解压tez-0.10.1.tar.gz
```sh
lava ssh -f ${HOME}/hivehost -e 'tar -zxvf /usr/local/oushu/tez/tez-0.10.1.tar.gz -C /usr/local/oushu/tez'
lava ssh -f ${HOME}/hivehost -e 'chown -R hive:hadoop /usr/local/oushu/tez'
```
### 配置
在/usr/local/oushu/conf/hive/下创建的tez配置文件tez-site.xml并修改:
```xml
tez.lib.uris
/apps/tez/tez-0.10.1-minimal.tar.gz
tez.container.max.java.heap.fraction
0.2
tez.use.cluster.hadoop-libs
true
tez.am.am-rm.heartbeat.interval-ms.max
250
tez.am.container.idle.release-timeout-max.millis
20000
tez.am.container.idle.release-timeout-min.millis
10000
tez.am.container.reuse.enabled
true
tez.am.container.reuse.locality.delay-allocation-millis
250
tez.am.container.reuse.non-local-fallback.enabled
false
tez.am.container.reuse.rack-fallback.enabled
true
tez.am.java.opts
-server -Xmx1024m -Djava.net.preferIPv4Stack=true
tez.am.launch.cluster-default.cmd-opts
-server -Djava.net.preferIPv4Stack=true
tez.am.launch.cmd-opts
-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB
tez.am.launch.env
LD_LIBRARY_PATH=/usr/local/oushu/hdfs/lib/native
tez.am.log.level
INFO
tez.am.max.app.attempts
2
tez.am.maxtaskfailures.per.node
10
tez.am.resource.memory.mb
2048
tez.am.resource.cpu.vcores
2
tez.am.view-acls
tez.counters.max
10000
tez.counters.max.groups
3000
tez.generate.debug.artifacts
false
tez.grouping.max-size
1073741824
tez.grouping.min-size
16777216
tez.grouping.split-waves
1.7
tez.queue.name
default
tez.runtime.compress
true
tez.runtime.compress.codec
org.apache.hadoop.io.compress.SnappyCodec
tez.runtime.convert.user-payload.to.history-text
false
tez.runtime.io.sort.mb
512
tez.runtime.optimize.local.fetch
true
tez.runtime.pipelined.sorter.sort.threads
1
tez.runtime.shuffle.memory.limit.percent
0.25
tez.runtime.sorter.class
PIPELINED
tez.runtime.unordered.output.buffer.size-mb
76
tez.session.am.dag.submit.timeout.secs
600
tez.session.client.timeout.secs
-1
tez.shuffle-vertex-manager.max-src-fraction
0.4
tez.shuffle-vertex-manager.min-src-fraction
0.2
tez.staging-dir
/tmp/${user.name}/staging
tez.task.am.heartbeat.counter.interval-ms.max
4000
tez.task.generate.counters.per.io
true
tez.task.get-task.sleep.interval-ms.max
200
tez.task.launch.cluster-default.cmd-opts
-server -Djava.net.preferIPv4Stack=true
tez.task.launch.cmd-opts
-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB
tez.task.launch.env
LD_LIBRARY_PATH=/usr/local/oushu/hdfs/lib/native
tez.task.max-events-per-heartbeat
500
tez.task.resource.memory.mb
1024
tez.use.cluster.hadoop-libs
true
yarn.timeline-service.enabled
false
hive.tez.container.size
2048
```
同步tez到HDFS机器:
```shell
lava scp -r -f hdfs1 /usr/local/oushu/tez/tez-0.10.1-minimal.tar.gz =:/usr/local/oushu/hdfs/
# 登录hdfs1机器并上传到HDFS
ssh hdfs1
su hdfs
hdfs dfs -mkdir -p /apps/tez
hdfs dfs -copyFromLocal /usr/local/oushu/hdfs/tez-0.10.1-minimal.tar.gz /apps/tez
# 退出hdfs用户并回到hive1
exit
exit
```
修改 /usr/local/oushu/conf/hive/hive-site.xml使Hive用Tez
```shell
hive.execution.engine
tez
```
修改使用YARN引擎
修改/usr/local/oushu/conf/common下的mapred-site.xml
```shell
mapreduce.framework.name
yarn-tez
```
修改环境变量
```sh
export TEZ_CONF_DIR=/usr/local/oushu/conf/hive/tez-site.xml
export TEZ_JARS=/usr/local/oushu/tez/
export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*
# 将上方的环境变量信息追加到下方文件
# 如果Hive和HDFS为分离部署,需要将tez-site.xml复制到HDFS配置目录下
/usr/local/oushu/conf/hive/hive-env.sh
/usr/local/oushu/conf/common/hadoop-env.sh
```
### 启动
```shell
# 重新启动Hive
# Hive进程只能采用kill -9 pid的方式关闭
su hive
jps
* 3432 RunJar
* 2987 RunJar
kill -9 3432
kill -9 2987
exit
# 上述命令需要登录到hive2机器再执行一遍
ssh hive2
su - hive
jps
* 3433 RunJar
* 2988 RunJar
kill -9 3433
kill -9 2988
exit
# 回到hive1启动Hive
ssh hive1
su hive
source /usr/local/oushu/conf/hive/hive-env.sh
lava ssh -f ${HOME}/hivehost -e 'nohup hive --service metastore >/dev/null 2>&1 &'
lava ssh -f ${HOME}/hivehost -e 'nohup hive --service hiveserver2 >/dev/null 2>&1 &'
```
### 检查状态
```shell
# 进入hive客户端
hive
# 测试是否可用
hive:>create database td_test;
OK
Time taken:0.201 seconds
hive:>use td_test;
OK
hive:>create table test(id int);
OK
Time taken:0.234 seconds
hive:>insert into test values(1),(2);
OK
Time taken:14.73 seconds, Fetch:1 row(s)
hive:>select * from test;
Query ID = hive_20221110150743_4155afab-4bfa-4e8a-acb0-90c8c50ecfb5
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1478229439699_0007)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 2 2 0 0 0 0
Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 10.19 s
--------------------------------------------------------------------------------
OK
1 oushu
2 hive
Time taken: 11.48 seconds, Fetched: 2 row(s)
# 上边表格出现表明使用率Tez引擎
```
## 安装Hive Client安装
如果需要在并没有部署Hive的机器使用Hive命令,需要安装Hive Client端和HDFS Client端
Hive Client地址假定为`hive3,hive4,hive5`
### 准备
在hive1机器创建hiveclienthost
```shell
su root
touch ${HOME}/hiveclienthost
```
添加下面主机名到hiveclienthost:
``` sh
hive3
hive4
hive5
```
交换公钥,以便ssh免密码登陆和分发配置文件
```sh
# 和集群内其他机器交换公钥
lava ssh-exkeys -f ${HOME}/hiveclienthost -p ********
# 将repo文件分发给集群内其他机器
lava scp -f ${HOME}/hiveclienthost /etc/yum.repos.d/oushu.repo =:/etc/yum.repos.d
```
### 安装
```shell
lava ssh -f ${HOME}/hiveclienthost -e 'yum install -y hive'
lava ssh -f ${HOME}/hiveclienthost -e 'yum install -y hdfs mapreduce yarn'
lava ssh -f ${HOME}/hiveclienthost -e 'chown -R hdfs:hadoop /usr/local/oushu/conf/common/'
lava scp -r -f ${HOME}/hiveclienthost /usr/local/oushu/conf/common/* =:/usr/local/oushu/conf/common/
lava ssh -f ${HOME}/hiveclienthost -e 'chown -R hive:hadoop /usr/local/oushu/conf/hive/'
lava scp -r -f ${HOME}/hiveclienthost /usr/local/oushu/conf/hive/* =:/usr/local/oushu/conf/hive/
lava ssh -f ${HOME}/hiveclienthost -e 'sudo mkdir -p /data1/hdfs/hive/'
lava ssh -f ${HOME}/hiveclienthost -e 'chown -R hive:hadoop /data1/hdfs/hive/'
```
### 检查
```shell
ssh hive3
su hive
# 进入hive客户端
hive
# 测试是否可用
hive:>create database td_test;
OK
Time taken:0.201 seconds
# 有返回即证明客户端生效
```