# 命令行部署
如果您希望使用命令行的方式部署Spark,请按照本章节步骤安装。
本节默认yum源已经配置在IP为192.168.1.10的机器。
## Spark Standalone模式安装
### 前提
Spark Standalone模式需要依赖Zookeeper,HDFS集群。Zookeeper用于支持Spark HA,HDFS集群用于存储历史数据。
Zookeeper安装部署请参考:[Zookeeper 安装](../zookeeper/installation-zookeeper.rst)。
HDFS安装部署请参考:[HDFS 安装](../hdfs/installation-hdfs.rst)。
Zookeeper服务地址假定为`oushu1:2181,oushu2:2181,oushu3:2181`
HDFS nameservice地址假定为`hdfs://oushu`
首先登录到oushu1,然后切换到root用户
``` sh
ssh oushu1
su - root
```
创建一个`sparkhosts`文件,包含Spark集群中所有的机器
``` sh
cat > ${HOME}/sparkhosts << EOF
oushu1
oushu2
oushu3
oushu4
EOF
```
创建一个`sparkmasters`文件,包含Spark集群中所有的master机器
``` sh
cat > ${HOME}/sparkmasters << EOF
oushu1
oushu2
EOF
```
创建一个`sparkworkers`文件,包含Spark集群中所有的worker机器
``` sh
cat > ${HOME}/sparkworkers << EOF
oushu1
oushu2
oushu3
EOF
```
在oushu1节点配置yum源,安装lava命令行管理工具
``` sh
# 从yum源所在机器获取repo文件
scp oushu@192.168.1.10:/etc/yum.repos.d/oushu.repo /etc/yum.repos.d/oushu.repo
# 追加yum源所在机器信息到/etc/hosts文件
# 安装lava命令行管理工具
yum clean all
yum makecache
yum install lava
```
oushu1节点和集群内其他节点交换公钥,以便ssh免密码登陆和分发配置文件。
```sh
lava ssh-exkeys -f ${HOME}/sparkhosts -p ********
```
分发repo文件到其他机器
```sh
lava scp -f ${HOME}/sparkhosts /etc/yum.repos.d/oushu.repo =:/etc/yum.repos.d
```
### 安装
在使用yum install安装Spark
```sh
lava ssh -f ${HOME}/sparkhosts -e "sudo yum install -y spark"
```
### 配置
Spark配置参数保存在`spark-defaults.conf`,`spark-env.sh`。模板配置文件可在`/usr/local/oushu/spark/conf.empty`中找到。
文件名 | 作用描述
--------------------|---------
spark-defaults.conf | 程序内部的默认配置信息,配置信息将存到SparkConfig中,优先级最低
spark-env.sh | 程序启动环境变量
配置文件在Spark的配置路径下才能生效:`/usr/local/oushu/conf/spark`。
#### 准备数据目录
Spark Worker需要将Driver的执行日志保存到文件系统中,需要给Spark配置可用的文件路径
```sh
lava ssh -f ${HOME}/sparkhosts -e "mkdir -p /data1/spark/sparkwork"
lava ssh -f ${HOME}/sparkhosts -e "chown -R spark:spark /data1/spark"
```
Spark需要将Application运行历史保存到HDFS集群上,需要给Spark配置可用的HDFS文件路径。
登录上HDFS集群,执行以下命令创建HDFS文件路径
```
sudo -u hdfs hdfs dfs -mkdir -p /spark/spark-history
sudo -u hdfs hdfs dfs -chown -R spark:spark /spark
```
#### 配置依赖集群
登录到oushu1,然后切换到root用户
``` sh
ssh oushu1
su - root
```
添加HDFS配置文件`/usr/local/oushu/conf/spark/core-site.xml`
```xml
fs.defaultFS
hdfs://oushu
```
添加HDFS配置文件`/usr/local/oushu/conf/spark/hdfs-site.xml`
`hdfs-site.xml`模板
```xml
rpc.client.timeout
3600000
rpc.client.connect.tcpnodelay
true
rpc.client.max.idle
10000
rpc.client.ping.interval
10000
rpc.client.connect.timeout
600000
rpc.client.connect.retry
10
rpc.client.read.timeout
3600000
rpc.client.write.timeout
3600000
rpc.client.socket.linger.timeout
-1
dfs.client.read.shortcircuit
true
dfs.default.replica
3
dfs.prefetchsize
10
dfs.client.failover.max.attempts
15
dfs.default.blocksize
134217728
dfs.client.log.severity
INFO
input.connect.timeout
600000
input.read.timeout
3600000
input.write.timeout
3600000
input.localread.default.buffersize
2097152
input.localread.blockinfo.cachesize
1000
input.read.getblockinfo.retry
3
output.replace-datanode-on-failure
false
output.default.chunksize
512
output.default.packetsize
65536
output.default.write.retry
10
output.connect.timeout
600000
output.read.timeout
3600000
output.write.timeout
3600000
output.packetpool.size
1024
output.close.timeout
900000
dfs.domain.socket.path
/var/lib/hadoop-hdfs/dn_socket
dfs.client.use.legacy.blockreader.local
false
dfs.client.failover.proxy.provider.oushu
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.ha.namenodes.oushu
nn1,nn2
dfs.namenode.http-address.oushu.nn1
oushu1:50070
dfs.namenode.http-address.oushu.nn2
oushu2:50070
dfs.namenode.rpc-address.oushu.nn1
oushu1:9000
dfs.namenode.rpc-address.oushu.nn2
oushu2:9000
dfs.nameservices
oushu
```
**以下配置需要修改为HDFS实际部署配置**
```xml
dfs.ha.namenodes.oushu
nn1,nn2
dfs.namenode.http-address.oushu.nn1
oushu1:50070
dfs.namenode.http-address.oushu.nn2
oushu2:50070
dfs.namenode.rpc-address.oushu.nn1
oushu1:9000
dfs.namenode.rpc-address.oushu.nn2
oushu2:9000
dfs.nameservices
oushu
```
:::{note}
Spark Standalone模式部署不支持Kerberos HDFS
:::
#### 配置Spark Master/Worker
登录到oushu1,然后切换到root用户
``` sh
ssh oushu1
su - root
```
创建配置文件`spark-defaults.conf`
```sh
cat > ${HOME}/spark-defaults.conf << EOF
spark.master.rest.enabled=true
spark.master.rest.port=2881
EOF
```
创建配置文件`spark-env.sh`,需要修改Zookeeper的地址**oushu1:2181,oushu2:2181,oushu3:2181**到实际部署的地址
如果采用的hostname:port形式,需要将hostname对应的ip配置到/etc/hosts文件中
```sh
cat > ${HOME}/spark-env.sh << EOF
export SPARK_MASTER_PORT="2882"
export SPARK_MASTER_WEBUI_PORT="2883"
export SPARK_WORKER_WEBUI_PORT="2885"
export SPARK_WORKER_DIR="/data1/spark/sparkwork"
export SPARK_LOG_DIR="/usr/local/oushu/log/spark"
export JAVA_HOME="/usr/lib/jvm/java"
export SPARK_MASTER_OPTS="-Dfile.encoding=UTF-8"
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=oushu1:2181,oushu2:2181,oushu3:2181 -Dspark.deploy.zookeeper.dir=/oushu270120"
EOF
```
将配置文件分发到其他机器
```sh
lava scp -f ${HOME}/sparkworkers ${HOME}/spark-env.sh =:/tmp
lava scp -f ${HOME}/sparkworkers ${HOME}/spark-defaults.conf =:/tmp
lava ssh -f ${HOME}/sparkworkers -e "mv -f /tmp/spark-env.sh /usr/local/oushu/conf/spark"
lava ssh -f ${HOME}/sparkworkers -e "chown spark:spark /usr/local/oushu/conf/spark/spark-env.sh"
lava ssh -f ${HOME}/sparkworkers -e "mv -f /tmp/spark-defaults.conf /usr/local/oushu/conf/spark"
lava ssh -f ${HOME}/sparkworkers -e "chown spark:spark /usr/local/oushu/conf/spark/spark-defaults.conf"
```
#### 配置Spark History Server
History Server只部署到oushu1,所以只用在oushu1机器中追加配置到`spark-defaults.conf`
```sh
echo 'spark.history.ui.port=2884
spark.history.fs.logDirectory=hdfs://oushu/spark/spark-history
spark.eventLog.dir=hdfs://oushu/spark/spark-history
spark.eventLog.enabled=true' >> /usr/local/oushu/conf/spark/spark-defaults.conf
```
#### 配置Spark Client
登录到oushu4,然后切换到root用户
``` sh
ssh oushu4
su - root
```
准备目录
```sh
mkdir -p /data1/spark/spark-warehouse
chown -R spark:spark /data1/spark
chmod 733 /data1/spark/spark-warehouse
```
配置`spark-defaults.conf`文件
```sh
echo 'spark.sql.warehouse.dir=/data1/spark/spark-warehouse' >> /usr/local/oushu/conf/spark/spark-defaults.conf
```
`core-site.conf`文件添加以下配置
```xml
hive.exec.scratchdir
file:///data1/spark/spark-warehouse
```
### 启动
#### 启动Spark Master
登录oushu1节点
```
ssh oushu1
su - root
```
执行以下操作以启动Spark Master
```sh
lava ssh -f ${HOME}/sparkmasters -e "sudo -u spark /usr/local/oushu/spark/sbin/start-master.sh"
```
#### 启动Spark Worker
执行以下操作以启动Spark Worker
```sh
lava ssh -f ${HOME}/sparkworkers -e "sudo -u spark /usr/local/oushu/spark/sbin/start-slave.sh 'oushu1:2882,oushu2:2882'"
```
#### 启动History Server
```sh
sudo -u spark /usr/local/oushu/spark/sbin/start-history-server.sh
```
### 检查状态
检查Spark Application运行历史是否正常,浏览器访问`http://oushu1:2884`。
登录到oushu4,然后切换到root用户
``` sh
ssh oushu4
su - root
```
检查Spark Client是否能正常提交任务
```sh
sudo -u spark /usr/local/oushu/spark/bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://oushu1:2882,oushu2:2882 \
--executor-memory 1G \
--total-executor-cores 3 \
/usr/local/oushu/spark/examples/jars/spark-examples_2.12-3.1.2.jar \
1000
```
检查Spark SQL是否能正常启动
```sh
sudo -u spark /usr/local/oushu/spark/bin/spark-sql \
--master spark://oushu1:2882,oushu2:2882 \
--executor-memory 1G \
--total-executor-cores 3
```
在Spark SQL中执行以下SQL语句
```sql
show databases;
create table test(a int) using orc location 'hdfs://oushu/spark/test';
insert into test values(1);
select * from test;
```
### 常用命令
停止Spark服务
``` sh
#停止master
/usr/local/oushu/spark/sbin/stop-master.sh
#停止worker
/usr/local/oushu/spark/sbin/stop-slave.sh
# 停止History server
/usr/local/oushu/spark/sbin/stop-history-server.sh
```
### 注册到Skylab(可选)
在oushu1节点修改lava命令行工具配置中skylab的节点ip
```
vi /usr/local/oushu/lava/conf/server.json
```
编写注册request到一个文件,例如~/spark-register.json
```json
{
"data": {
"name": "SparkCluster",
"group_roles": [
{
// 安装master节点
"role": "spark.master",
"cluster_name": "oushu1",
"group_name": "master1",
// 要安装的机器信息,在lavaadmin的元数据表machine中能查到
"machines": [
{
"id": 1,
"name": "hostname1",
"subnet": "lava",
"data_ip": "127.0.0.1",
"manage_ip": "",
"assist_port": 1622,
"ssh_port": 22
}
]
},
{
// 安装worker节点
"role": "spark.worker",
"cluster_name": "oushu1",
"group_name": "worker1",
"machines": [
{
"id": 1,
"name": "hostname1",
"subnet": "lava",
"data_ip": "127.0.0.1",
"manage_ip": "",
"assist_port": 1622,
"ssh_port": 22
}
]
},
{
// 安装history节点
"role": "spark.history",
"cluster_name": "oushu1",
"group_name": "history1",
"machines": [
{
"id": 1,
"name": "hostname1",
"subnet": "lava",
"data_ip": "127.0.0.1",
"manage_ip": "",
"assist_port": 1622,
"ssh_port": 22
}
]
}
],
"config": {
"spark-defaults.conf": [
{
"key": "spark.master.rest.port",
"value": "2881"
},
{
"key": "spark.master.rest.enabled",
"value": "true"
},
{
"key": "spark.history.ui.port",
"value": "2884"
},
{
"key": "spark.history.fs.logDirectory",
"value": "hdfs://oushu/littleboy/spark/spark-history"
},
{
"key": "spark.eventLog.dir",
"value": "hdfs://oushu/littleboy/spark/spark-history"
},
{
"key": "spark.eventLog.enabled",
"value": "true"
}
],
"spark-env.sh": [
{
"key": "SPARK_LOG_DIR",
"value": "/usr/local/oushu/log/spark"
},
{
"key": "SPARK_MASTER_HOSTS",
"value": "oushu1,oushu2"
},
{
"key": "SPARK_MASTER_WEBUI_PORT",
"value": "2883"
},
{
"key": "SPARK_MASTER_PORT",
"value": "2882"
},
{
"key": "SPARK_WORKER_WEBUI_PORT",
"value": "2885"
},
{
"key": "SPARK_MASTER_OPTS",
"value": "\"-Dfile.encoding=UTF-8\""
},
{
"key": "SPARK_WORKER_DIR",
"value": "/data1/spark/sparkwork"
},
{
"key": "SPARK_DAEMON_JAVA_OPTS",
"value": "-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=oushu1:2181,oushu2:2181,oushu3:2181 -Dspark.deploy.zookeeper.dir=/oushu270120"
}
]
}
}
}
```
上述配置文件中,需要根据实际情况修改machines数组中的机器信息,在平台基础组件lava所安装的机器执行:
```
psql lavaadmin -p 4432 -U oushu -c "select m.id,m.name,s.name as subnet,m.private_ip as data_ip,m.public_ip as manage_ip,m.assist_port,m.ssh_port from machine as m,subnet as s where m.subnet_id=s.id;"
```
获取到所需的机器信息,根据服务角色对应的节点,将机器信息添加到machines数组中。
例如oushu1对应spark master节点,那么oushu1的机器信息需要备添加到spark.master角色对应的machines数组中。
调用lava命令注册集群:
```
lava login -u oushu -p ******** -T {租户id}
lava onprem-register service -s Spark -f ~/spark-register.json
```
如果返回值为:
```
Add service by self success
```
则表示注册成功,如果有错误信息,请根据错误信息处理。
同时,从页面登录后,在自动部署模块对应服务中可以查看到新添加的集群,同时列表中会实时监控Spark进程在机器上的状态。

## Spark Yarn模式安装
### 前提
Spark Yarn模式需要依赖Yarn集群,HDFS集群。
Yarn安装部署请参考:[Yarn 安装](../yarn/installation-yarn.rst)。
HDFS安装部署请参考:[HDFS 安装](../hdfs/installation-hdfs.rst)。HDFS nameservice地址假定为`hdfs://oushu`
首先登录到oushu4,然后切换到root用户
``` sh
ssh oushu4
su - root
```
### 安装
Spark rpm安装参照Spark Standalone模式
### 配置
#### 准备数据目录
在HDFS集群上创建历史数据目录
```
sudo -u hdfs hdfs dfs -mkdir /spark/spark-history
sudo -u hdfs hdfs dfs -chown -R spark:spark /spark
```
#### 配置依赖集群
HDFS依赖集群的配置参照Spark Standalone模式
##### 配置Yarn
登录到yarn1,在Yarn集群上使用yum install安装`spark-shuffle`
```sh
ssh yarn1
su - root
cat > ${HOME}/yarnhost << EOF
yarn1
yarn2
yarn3
EOF
lava ssh -f ${HOME}/yarnhost -e "sudo yum install -y spark-shuffle"
```
添加以下配置到`/usr/local/oushu/conf/yarn/yarn-site.xml`
```xml
yarn.nodemanager.aux-services
spark_shuffle,mapreduce_shuffle
yarn.nodemanager.aux-services.spark_shuffle.class
org.apache.spark.network.yarn.YarnShuffleService
yarn.nodemanager.aux-services.spark_shuffle.classpath
/usr/local/oushu/spark-shuffle-3.1.2/yarn/spark-3.1.2-yarn-shuffle.jar
```
将配置文件分发到所有yarn节点,并重启nodemanager
```sh
lava scp -f ${HOME}/yarnhost /usr/local/oushu/conf/yarn/yarn-site.xml =:/usr/local/oushu/conf/yarn/yarn-site.xml
lava ssh -f ${HOME}/yarnhost -e 'sudo -u yarn yarn --daemon stop nodemanager'
lava ssh -f ${HOME}/yarnhost -e 'sudo -E -u yarn yarn --daemon start nodemanager'
```
将配置文件`/usr/local/oushu/conf/yarn/yarn-site.xml`复制到oushu4
```
scp /usr/local/oushu/conf/yarn/yarn-site.xml root@oushu4:/usr/local/oushu/conf/spark/yarn-site.xml
```
##### 配置Kerberos(可选)
如果HDFS配置了Kerberos认证,登录到Kerberos服务器执行下面命令进入Kerberos控制台
```shell
kadmin.local
```
进入控制台后执行下列操作,配置principal实体名
```
addprinc -randkey spark@OUSHU.COM
ktadd -k /etc/security/keytabs/spark.keytab spark@OUSHU.COM
```
:::{note}
Kerberos中hostname不支持大写,如果hostname带大写字母,请将hostname改为小写字母
:::
将生成的keytab复制到oushu4
```sh
scp root@kerberosserver:/etc/security/keytabs/spark.keytab /etc/security/keytabs/spark.keytab
```
`core-site.xml`文件中添加以下配置
```xml
hadoop.security.authentication
kerberos
hadoop.security.authorization
true
hadoop.rpc.protection
authentication
```
在`hdfs-site.xml`文件中添加以下配置
```xml
dfs.data.transfer.protection
authentication
dfs.namenode.kerberos.principal.pattern
*
```
##### 修改配置文件
创建配置文件`spark-defaults.conf`
```sh
cat > /usr/local/oushu/conf/spark/spark-defaults.conf << EOF
spark.history.ui.port=2884
spark.history.fs.logDirectory=hdfs://oushu/spark/spark-history
spark.eventLog.dir=hdfs://oushu/spark/spark-history
spark.eventLog.enabled=true
spark.yarn.stagingDir=hdfs://oushu/spark/staging
EOF
```
创建配置文件`spark-env.sh`
```sh
cat > /usr/local/oushu/conf/spark/spark-env.sh << EOF
export YARN_CONF_DIR=/usr/local/oushu/conf/spark
export JAVA_HOME="/usr/lib/jvm/java"
export SPARK_MASTER_OPTS="-Dfile.encoding=UTF-8"
EOF
```
History Server配置Kerberos(可选)
配置文件`spark-defaults.conf`追加以下配置
```sh
echo 'spark.history.kerberos.enabled=true
spark.history.kerberos.principal=spark@OUSHU.COM
spark.history.kerberos.keytab=/etc/security/keytabs/spark.keytab' >> /usr/local/oushu/conf/spark/spark-defaults.conf
```
### 启动
启动History Server
```
sudo -u spark /usr/local/oushu/spark/sbin/start-history-server.sh
```
### 检查状态
浏览器访问http://oushu4:2884查看History服务是否正常
提交作业
``` sh
sudo -u spark /usr/local/oushu/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
--driver-memory 1g \
--executor-memory 2g \
--executor-cores 1 \
--principal spark@OUSHU.COM \
--keytab /etc/security/keytabs/spark.keytab \
/usr/local/oushu/spark/examples/jars/spark-examples*.jar \
10
```
:::{note}
提交作业访问Kerberos HDFS/Yarn需要指定--principal,--keytab
:::
### 常用命令
停止History Server
``` sh
# 停止History server
sudo -u spark /usr/local/oushu/spark/sbin/stop-history-server.sh
```
### 常见问题
* 问题:提交作业时报错`User spark not found`

原因:yarn集群上不存在`spark`用户
解决办法:在yarn集群上创建`spark`用户
* 问题:提交作业时报错`Requested user spark is not whitelisted and has id 993,which is below the minimum allowed 1000`

原因:yarn禁止了user id 1000以下的用户提交任务
解决办法1: 修改yarn集群 user id 到1000以上`usermod -u 2001 spark`
解决办法2: 修改yarn配置文件`/etc/hadoop/container-executor.cfg`中的配置项`min.user.id=0`,然后重启yarn集群