# 命令行部署 --- ## 前提 YARN需要依赖HDFS集群, HDFS安装部署请参考:[HDFS 安装](../hdfs/installation-hdfs.rst)。 HDFS服务地址假定为`hdfs1:9000,hdfs2:9000,hdfs3:9000` ZooKeeper安装部署请参考:[ZooKeeper 安装](../zookeeper/installation-zookeeper.rst)。 ZooKeeper服务地址假定为`zookeeper1:2181,zookeeper2:2181,zookeeper3:2181` ### Kerberos认证依赖(可选) 若开启Kerberos认证,Kerberos安装部署请参考:[Kerberos 安装](../kerberos/installation-kerberos.rst)。 KDC服务地址假定为`kdc1` ### Ranger认证依赖(可选) 若开启Ranger认证,Ranger安装部署请参考:[Ranger 安装](../ranger/ranger-start-installation-cli.md)。 Ranger 服务地址假定为`ranger1` ### 配置yum源并安装lava 登录到yarn1,然后切换到root用户 ``` sh ssh yarn1 su - root ``` 配置yum源,安装lava命令行管理工具 ```sh # 从yum源所在机器(假设为192.168.1.10)获取repo文件 scp root@192.168.1.10:/etc/yum.repos.d/oushu.repo /etc/yum.repos.d/oushu.repo # 追加yum源所在机器信息到/etc/hosts文件 # 安装lava命令行管理工具 yum clean all yum makecache yum install -y lava ``` 创建一个`yarnhost`文件 ``` touch yarnhost ``` 配置yarnhost内容为YARN的所有hostname ``` sh yarn1 yarn2 yarn3 ``` 在首台机器上和集群内其他节点交换公钥,以便ssh免密码登陆和分发配置文件 ```sh # 和集群内其他机器交换公钥 lava ssh-exkeys -f ${HOME}/yarnhost -p ******** # 将repo文件分发给集群内其他机器 lava scp -f ${HOME}/yarnhost /etc/yum.repos.d/oushu.repo =:/etc/yum.repos.d ``` ## 安装 ### 准备 创建`rmhost`文件 ``` touch rmhost ``` 配置rmhost内容为YARN的ResourceManager节点hostname: ``` yarn1 yarn2 ``` 安装YARN ```sh lava ssh -f ${HOME}/yarnhost -e 'sudo yum install -y yarn' lava ssh -f ${HOME}/yarnhost -e 'mkdir -p /data1/yarn/nodemanager' lava ssh -f ${HOME}/yarnhost -e 'chown -R yarn:hadoop /data1/yarn/' lava ssh -f ${HOME}/yarnhost -e 'chmod -R 755 yarn:hadoop /data1/yarn/' ``` 安装MapReduce(默认计算引擎,可选) ```sh lava ssh -f ${HOME}/yarnhost -e 'sudo yum install -y mapreduce' ``` 如需需在非YARN节点安装使用客户端,遵守HDFS章节客户端安装的步骤即可。 ### Kerberos准备(可选) 如果开启Kerberos,则需要在所有YARN节点安装Kerberos客户端。 ```shell lava ssh -f ${HOME}/yarnhost -e "yum install -y krb5-libs krb5-workstation" ``` 在yarn1节点执行下面命令进入Kerberos控制台 ```shell ssh kdc1 kadmin.loacl ``` 进入控制台后执行下列操作 ```shell # 根据角色规划信息为每个节点的对应角色生成实例 addprinc -randkey resourcemanager/yarn1@OUSHU.COM addprinc -randkey resourcemanager/yarn2@OUSHU.COM addprinc -randkey nodemanager/yarn1@OUSHU.COM addprinc -randkey nodemanager/yarn2@OUSHU.COM addprinc -randkey nodemanager/yarn3@OUSHU.COM addprinc -randkey HTTP/yarn1@OUSHU.COM addprinc -randkey HTTP/yarn2@OUSHU.COM addprinc -randkey HTTP/yarn3@OUSHU.COM addprinc -randkey yarn@OUSHU.COM # 为每个实例生成keytab文件 ktadd -k /etc/security/keytabs/yarn.keytab resourcemanager/yarn1@OUSHU.COM ktadd -k /etc/security/keytabs/yarn.keytab resourcemanager/yarn2@OUSHU.COM ktadd -k /etc/security/keytabs/yarn.keytab nodemanager/yarn1@OUSHU.COM ktadd -k /etc/security/keytabs/yarn.keytab nodemanager/yarn2@OUSHU.COM ktadd -k /etc/security/keytabs/yarn.keytab nodemanager/yarn3@OUSHU.COM ktadd -k /etc/security/keytabs/yarn.keytab yarn@OUSHU.COM # 为每个keytab文件追加HTTP实例的密钥 ktadd -k /etc/security/keytabs/yarn.keytab HTTP/yarn1@OUSHU.COM ktadd -k /etc/security/keytabs/yarn.keytab HTTP/yarn2@OUSHU.COM ktadd -k /etc/security/keytabs/yarn.keytab HTTP/yarn3@OUSHU.COM # 如果使用mapreduce引擎添加下边步骤 addprinc -randkey mapreduce@OUSHU.COM ktadd -k /etc/security/keytabs/yarn.keytab mapreduce@OUSHU.COM # 完成退出 quit ``` 回到yarn1并将生成的keytab进行分发: ```sh ssh yarn1 lava ssh -f ${HOME}/yarnhost -e 'mkdir -p /etc/security/keytabs/' scp root@kdc1:/etc/krb5.conf /etc/krb5.conf scp root@kdc1:/etc/security/keytabs/yarn.keytab /etc/security/keytabs/yarn.keytab lava scp -r -f ${HOME}/yarnhost /etc/krb5.conf =:/etc/krb5.conf lava scp -r -f ${HOME}/yarnhost /etc/security/keytabs/yarn.keytab =:/etc/security/keytabs/yarn.keytab lava ssh -f ${HOME}/yarnhost -e 'chown yarn:hadoop /etc/security/keytabs/yarn.keytab' ``` ## 配置 ### HA配置 修改 YARN 配置 ``` vim /usr/local/oushu/conf/common/yarn-site.xml ``` ```shell yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.resource.memory-mb 8192 yarn.nodemanager.resource.cpu-vcores 2 yarn.scheduler.minimum-allocation-mb 1024 yarn.scheduler.maximum-allocation-mb 4096 yarn.resourcemanager.zk-address yarn1:2181,yarn2:2181,yarn3:2181 yarn.resourcemanager.ha.enabled true yarn.resourcemanager.cluster-id yarn1 yarn.resourcemanager.ha.rm-ids rm1,rm2 yarn.resourcemanager.hostname.rm1 yarn1 yarn.resourcemanager.hostname.rm2 yarn2 yarn.resourcemanager.webapp.address.rm1 yarn1:8088 yarn.resourcemanager.webapp.address.rm2 yarn2:8088 yarn.resourcemanager.webapp.https.address.rm1 yarn1:8090 yarn.resourcemanager.webapp.https.address.rm2 yarn2:8090 yarn.resourcemanager.recovery.enabled true yarn.resourcemanager.store.class org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore yarn.nodemanager.pmem-check-enabled false yarn.nodemanager.vmem-check-enabled false yarn.scheduler.capacity.maximum-am-resource-percent 0.6 yarn.nodemanager.local-dirs /data1/yarn/nodemanager yarn.nodemanager.log-dirs /usr/local/oushu/log/hadoop yarn.resourcemanager.recovery.enabled true yarn.resourcemanager.store.class org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore ``` 同步配置文件 ```shell # 拷贝配置文件到其他节点 lava scp -r -f ${HOME}/yarnhost /usr/local/oushu/conf/common/* =:/usr/local/oushu/conf/common/ ``` ### Kerberos配置(可选) ``` vim /usr/local/oushu/conf/common/yarn-site.xml ``` ``` yarn.resourcemanager.keytab /etc/security/keytabs/yarn.keytab yarn.resourcemanager.principal resourcemanager/_HOST@OUSHU.COM yarn.resourcemanager.webapp.spnego-principal HTTP/_HOST@OUSHU.COM yarn.nodemanager.keytab /etc/security/keytabs/yarn.keytab yarn.nodemanager.principal nodemanager/_HOST@OUSHU.COM yarn.http.policy HTTPS_ONLY yarn.nodemanager.container-executor.class org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor yarn.nodemanager.linux-container-executor.group hadoop yarn.nodemanager.linux-container-executor.path /usr/local/oushu/yarn/bin/container-executor ``` #### 修改container-executor和container-executor.cfg权限 节点的container-executor所有者和权限,要求其所有者为root,所有组为hadoop。 节点的container-executor.cfg文件的所有者和权限,要求该文件及其所有的上级目录的所有者均为root,所有组为hadoop ```shell lava ssh -f ${HOME}/yarnhost -e 'chown root:hadoop /usr/local/oushu/yarn/bin/container-executor' lava ssh -f ${HOME}/yarnhost -e 'chmod 6050 /usr/local/oushu/yarn/bin/container-executor' lava ssh -f ${HOME}/yarnhost -e 'mkdir -p /etc/hadoop/' scp /usr/local/oushu/conf/common/container-executor.cfg /etc/hadoop/ vim /etc/hadoop/container-executor.cfg -> yarn.nodemanager.local-dirs=/data1/yarn/nodemanager yarn.nodemanager.log-dirs=/usr/local/oushu/log/hadoop yarn.nodemanager.linux-container-executor.group=hadoop banned.users= allowed.system.users= min.user.id=1000 lava scp -r -f ${HOME}/yarnhost /etc/hadoop/container-executor.cfg =:/etc/hadoop/ lava ssh -f ${HOME}/yarnhost -e 'chown root:hadoop /etc/hadoop/container-executor.cfg' lava ssh -f ${HOME}/yarnhost -e 'chmod 400 /etc/hadoop/container-executor.cfg' ``` 如果使用MapReduce计算引擎 ``` vim /usr/local/oushu/conf/common/mapred-site.xml ``` ``` mapreduce.framework.name yarn mapreduce.jobhistory.http.policy HTTPS_ONLY ``` 分发修改的配置 ```shell lava scp -r -f ${HOME}/yarnhost /usr/local/oushu/conf/common/mapred-site.xml =:/usr/local/oushu/conf/common/ lava scp -r -f ${HOME}/yarnhost /usr/local/oushu/conf/common/yarn-site.xml =:/usr/local/oushu/conf/common/ ``` ### 配置调优(可选) 除默认配置外,我们在实际使用过程中可能会出现性能差,实际运行效果不理想的情况,此时我们需要考虑对YARN和Mapreduce进行调优。 一般我们需要考虑结合实际内存和CPU性能进行调整,查看内存和CPU使用情况,一般来讲资源分配的CPU核数为物理实际核数的两到三倍。 ```shell # 查看内存使用情况 $ free -h total used free shared buff/cache available Mem: 61G 15G 32G 273M 13G 45G Swap: 0B 0B 0B ``` 修改配置文件/usr/local/oushu/conf/common/mapred-site.xml,设置使用Job的内存。 ```xml mapreduce.am.max-attempts 2 mapreduce.job.counters.max 130 mapreduce.job.reduce.slowstart.completedmaps 0.05 mapreduce.map.java.opts -Xmx1024m mapreduce.map.memory.mb 2048 mapreduce.map.sort.spill.percent 0.7 mapreduce.reduce.java.opts -Xmx1024m mapreduce.reduce.memory.mb 2048 mapreduce.reduce.shuffle.input.buffer.percent 0.8 mapreduce.reduce.shuffle.merge.percent 0.75 mapreduce.reduce.shuffle.parallelcopies 30 mapreduce.reduce.speculative false mapreduce.task.io.sort.factor 100 mapreduce.task.io.sort.mb 358 yarn.app.mapreduce.am.command-opts -Xmx1024m yarn.app.mapreduce.am.resource.mb 2048 mapreduce.map.cpu.vcores 2 mapreduce.reduce.cpu.vcores 2 mapred.job.reuse.jvm.num.tasks 10 ``` 修改配置文件/usr/local/oushu/conf/common/yarn-site.xml ```xml //container yarn.scheduler.maximum-allocation-mb 12800 yarn.scheduler.maximum-allocation-vcores 8 yarn.scheduler.minimum-allocation-mb 512 // nodemanger yarn.nodemanager.container-metrics.unregister-delay-ms 60000 yarn.nodemanager.container-monitor.interval-ms 3000 yarn.nodemanager.log-aggregation.compression-type gz yarn.nodemanager.resource.cpu-vcores 8 yarn.nodemanager.resource.memory-mb 123904 yarn.nodemanager.vmem-check-enabled false yarn.nodemanager.vmem-pmem-ratio 2.1 //rm yarn.resourcemanager.am.max-attempts 2 yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor 1 yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round 0.33 yarn.resourcemanager.placement-constraints.handler scheduler yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler yarn.scheduler.maximum-allocation-mb 12800 yarn.scheduler.maximum-allocation-vcores 8 yarn.scheduler.minimum-allocation-mb 512 ``` 同时可以适当增加NameNode使用内存来提升效率,查看usr/local/oushu/conf/common/hadoop-env.xml ``` export HADOOP_NAMENODE_OPTS="-Xmx6144m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70" export HADOOP_DATANODE_OPTS="-Xmx2048m -Xss256k" ``` 增加"-Xmx" 后的数值来增加分配内存 ## 启动 在yarn1节点 如果开启了Kerberos认证(可选): ```shell su - yarn kinit -kt /etc/security/keytabs/yarn.keytab yarn@OUSHU.COM # 一般情况下,没有报错提示就认为成功,也可以通过: echo $? 0 # 返回值为0则是成功 ``` 启动YARN服务 ```shell # 启动ResourceManager lava ssh -f ${HOME}/rmhost -e 'sudo -E -u yarn yarn --daemon start resourcemanager' # 启动NodeManager lava ssh -f ${HOME}/yarnhost -e 'sudo -E -u yarn yarn --daemon start nodemanager' ``` ## 检查状态 YARN集群情况检查 ```shell ResourceManager整体情况 yarn rmadmin -getAllServiceState yarn1:8033 active yarn2:8033 standby 查看NodeManager信息 yarn node -list -all Total Nodes:3 Node-Id Node-State Node-Http-Address Number-of-Running-Containers yarn1:45477 RUNNING yarn1:8042 0 yarn2:38203 RUNNING yarn2:8042 0 yarn3:44035 RUNNING yarn3:8042 0 ``` ## 常用命令 停止所有ResourceManager ```shell lava ssh -f ${HOME}/rmhost -e 'sudo -u yarn yarn --daemon stop resourcemanager' ``` 停止所有NodeManager ```shell lava ssh -f ${HOME}/yarnhost -e 'sudo -u yarn yarn --daemon stop nodemanager' ``` ## 注册到Skylab(可选) 将要安装的机器需要通过机器管理添加到skylab中,如果您尚未添加,请参考[注册机器](../start/install-lava.md)。 在yarn1上修改/usr/local/oushu/lava/conf配置`server.json`,替换localhost为skylab的服务器ip,具体skylab的基础服务lava安装步骤请参考:[lava安装](../start/start-installation.rst)。 然后创建`~/yarn.json`文件,文件内容参考如下: ```json { "data": { "name": "YARNCluster", "group_roles": [ { "role": "yarn.resourcemanager", "cluster_name": "resourcemanager", "group_name": "resourcemanager-id", "machines": [ { "id": 1, "name": "ResourceManager", "subnet": "lava", "data_ip": "192.168.1.11", "manage_ip": "", "assist_port": 1622, "ssh_port": 22 },{ "id": 2, "name": "ResourceManager2", "subnet": "lava", "data_ip": "192.168.1.12", "manage_ip": "", "assist_port": 1622, "ssh_port": 22 } ] }, { "role": "yarn.nodemanager", "cluster_name": "nodemanager", "group_name": "nodemanager-id", "machines": [ { "id": 1, "name": "nodemanager1", "subnet": "lava", "data_ip": "192.168.1.11", "manage_ip": "", "assist_port": 1622, "ssh_port": 22 },{ "id": 2, "name": "nodemanager2", "subnet": "lava", "data_ip": "192.168.1.12", "manage_ip": "", "assist_port": 1622, "ssh_port": 22 },{ "id": 3, "name": "nodemanager3", "subnet": "lava", "data_ip": "192.168.1.13", "manage_ip": "", "assist_port": 1622, "ssh_port": 22 } ] } ] } } ``` 上述配置文件中,需要根据实际情况修改machines数组中的机器信息,在平台基础组件lava所安装的机器执行: ``` psql lavaadmin -p 4432 -U oushu -c "select m.id,m.name,s.name as subnet,m.private_ip as data_ip,m.public_ip as manage_ip,m.assist_port,m.ssh_port from machine as m,subnet as s where m.subnet_id=s.id;" ``` 获取到所需的机器信息,根据服务角色对应的节点,将机器信息添加到machines数组中。 例如yarn1对应的YARN ResourceManager角色,yarn1的机器信息需要备添加到yarn.resourcemanager角色对应的machines数组中。 调用lava命令注册集群: ``` lava login -u oushu -p ******** -T {租户id} lava onprem-register service -s YarnMapreduce -f ~/yarn.json ``` 如果返回值为: ``` Add service by self success ``` 则表示注册成功,如果有错误信息,请根据错误信息处理。 从页面登录后,在自动部署模块对应服务中可以查看到新添加的集群,同时列表中会实时监控yarn进程在机器上的状态。 ![](./images/skylabui.png) ## YARN集成Ranger认证(可选) ### Ranger安装 如果开启Ranger,则需要在所有YARN节点安装Ranger客户端。 ```shell lava ssh -f ${HOME}/yarnhost -e "yum install -y ranger-yarn-plugin" lava ssh -f ${HOME}/yarnhost -e 'mkdir /usr/local/oushu/yarn/etc' lava ssh -f ${HOME}/yarnhost -e "ln -s /usr/local/oushu/conf/yarn /usr/local/oushu/yarn/etc/hadoop" ``` ### Ranger配置 在yarn1节点下 修改配置文件/usr/local/oushu/ranger-yarn-plugin_2.3.0/install.properties ```shell POLICY_MGR_URL=http://ranger1:6080 REPOSITORY_NAME=yarndev COMPONENT_INSTALL_DIR_NAME=/usr/local/oushu/yarn ``` 同步YARN的Ranger配置,并执行初始化配置脚本 ```shell lava scp -r -f ${HOME}/yarnhost /usr/local/oushu/ranger-yarn-plugin_2.3.0/install.properties =:/usr/local/oushu/ranger-yarn-plugin_2.3.0/ lava ssh -f ${HOME}/yarnhost -e '/usr/local/oushu/ranger-yarn-plugin_2.3.0/enable-yarn-plugin.sh' ``` 执行完初始化脚本后,看到如下信息说明成功,并按照要求重启服务。 ```shell Ranger Plugin for hive has been enabled. Please restart hive to ensure that changes are effective. ``` 重新启动YARN ```shell # 重启YARN集群 lava ssh -f ${HOME}/yarnhost -e 'sudo -E -u yarn yarn --daemon stop nodemanager' lava ssh -f ${HOME}/rmhost -e 'sudo -E -u yarn yarn --daemon stop resourcemanager' lava ssh -f ${HOME}/yarnhost -e 'sudo -E -u yarn yarn --daemon start nodemanager' lava ssh -f ${HOME}/rmhost -e 'sudo -E -u yarn yarn --daemon start resourcemanager' ``` ### 在rangerUI 上配置用户权限策略 ### 创建`YARN Service`服务 - 登陆rangerUI http://192.168.1.14:6080,点击➕号添加`YARN Service` ![image](./images/yarn-ranger-step-1.png) - 填写服务名,注意需要和`install.properties`文件里的`REPOSITORY_NAME`名称保持一致 ![image](./images/yarn-ranger-step-2.png) - 用户名、密码自定义,若开启HA则HA方式把ResourceManager链接方式全部写入,若kerberos开启了kerberos认证,则填写相应的keytab文件,否则使用默认配置 ![image](./images/yarn-ranger-step-3.png) - 运行测试查看是否配置正确,正确后点击添加保存。 ![image](./images/yarn-ranger-step-4.png) ![image](./images/yarn-ranger-step-5.png) - 回到首页查看刚刚添加服务 ![image](./images/yarn-ranger-step-6.png) ### 创建访问测量 - 找到刚刚创建的服务,点击名称 ![image](./images/yarn-ranger-step-6.png) - 点击'Add New Policy'按钮 ![image](./images/yarn-policy-6.png) - 设置访问策略,使得yarn用户有提交资源队列的权限,同时,要确保recursive 滑块处于开启状态。 ![image](./images/yarn-policy-2.png) ![image](./images/yarn-policy-3.png) - 查看刚刚设置 ![image](./images/yarn-policy-4.png) ### Ranger + Kerberos 注意项 当开启Kerberos配置时,需要对Ranger服务也开启Kerberos,同时在配置YARN repo时,加入参数如下: ![image](./images/yarn-ranger-kerberos.png) 参数值为配置的Kerberos实体用户名。 添加完Policy后,稍等约半分钟等待Policy生效。生效后使用yarn用户就可以向YARN的root.default队列中提交、删除、查询作业等操作。