Fork me on GitHub

在Minikube上运行Spark集群

[TOC]

背景

​ Spark2.3版本开始支持使用spark-submit直接提交任务给Kubernetes集群。执行机制原理:

  • Spark创建一个在Kubernetes pod中运行的Spark驱动程序。
  • 驱动程序创建执行程序,这些执行程序也在Kubernetes pod中运行并连接到它们,并执行应用程序代码。
  • 当应用程序完成时,执行程序窗格会终止并清理,但驱动程序窗格会保留日志并在Kubernetes API中保持“已完成”状态,直到它最终被垃圾收集或手动清理。

Spark集群组件

第一部分 环境准备

1.1 minikube虚拟机准备

由于spark集群对内存和cpu资源要求较高,在minikube启动前,提前配置较多的资源给虚拟机。

当minikube启动时,它以单节点配置开始,默认情况下占用1Gb内存和2CPU内核,但是,为了运行spark集群,这个资源配置是不够的,而且作业会失败。

1
2
3
4
5
# minikube config set memory 8192
These changes will take effect upon a minikube delete and then a minikube start

# minikube config set cpus 2
These changes will take effect upon a minikube delete and then a minikube start

或者用下面的命令启集群

1
# minikube start --cpus 2 --memory 8192

1.2 Spark环境准备

第一步 下载saprk2.3

1
# wget http://apache.mirrors.hoobly.com/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz

解压缩:

1
# tar xvf spark-2.3.0-bin-hadoop2.7.tgz

制作docker镜像

1
2
# cd spark-2.3.0-bin-hadoop2.7
# docker build -t rongxiang/spark:2.3.0 -f kubernetes/dockerfiles/spark/Dockerfile .

查看镜像情况:

1
2
3
# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
rongxiang1986/spark 2.3.0 c5c806314f25 5 days ago 346MB

登录docker 账户:

1
2
3
4
5
# docker login
Login with your Docker ID to push and pull images from Docker Hub. If you don't have a Docker ID, head over to https://hub.docker.com to create one.
Username:
Password:
Login Succeeded

将之前build好的镜像pull到docker hub上:

1
# docker push rongxiang1986/spark:2.3.0

注意这里的格式要求(我踩坑了):docker push 注册用户名/镜像名

https://hub.docker.com/上查看,镜像确实push上去了。

第二部分 提交Spark作业

2.1 作业提交

提前配置serviceaccount信息。

1
2
3
4
# kubectl create serviceaccount spark
serviceaccount/spark created
# kubectl create clusterrolebinding spark-role --clusterrole=admin --serviceaccount=default:spark --namespace=default
clusterrolebinding.rbac.authorization.k8s.io/spark-role created

提交作业:

1
2
3
4
5
6
7
8
9
10
# ./spark-submit \
--master k8s://https://192.168.99.100:8443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.authenticate.executor.serviceAccountName=spark \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image=rongxiang1986/spark:2.3.0 \
local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar

提交命令的参数含义分别是:

  • --class:应用程序的入口点(命令中使用:org.apache.spark.examples.SparkPi);
  • --master:Kubernetes集群的URL(k8s://https://192.168.99.100:8443);
  • --deploy-mode:驱动程序部署位置(默认值:客户端),这里部署在集群中;
  • --conf spark.executor.instances=2:运行作业启动的executor个数;
  • --conf spark.kubernetes.container.image=rongxiang1986/spark:2.3.0:使用的docker镜像名称;
  • local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar:应用程序依赖jar包路径;

注意:目前deploy-mode只支持cluster模式,不支持client模式。

Error: Client mode is currently not supported for Kubernetes.

作业运行回显如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
2018-08-12 15:51:17 WARN  Utils:66 - Your hostname, deeplearning resolves to a loopback address: 127.0.1.1; using 192.168.31.3 instead (on interface enp0s31f6)
2018-08-12 15:51:17 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2018-08-12 15:51:18 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: spark-pi-7314d819cd3730b4bf7d02bfedd21373-driver
namespace: default
labels: spark-app-selector -> spark-8be4d909d85148bc9f1f91d511c275c6, spark-role -> driver
pod uid: 7f6dd84d-9e04-11e8-b58f-080027b3a6c0
creation time: 2018-08-12T07:51:18Z
service account name: spark
volumes: spark-token-rzrgk
node name: N/A
start time: N/A
container images: N/A
phase: Pending
status: []
2018-08-12 15:51:18 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: spark-pi-7314d819cd3730b4bf7d02bfedd21373-driver
namespace: default
labels: spark-app-selector -> spark-8be4d909d85148bc9f1f91d511c275c6, spark-role -> driver
pod uid: 7f6dd84d-9e04-11e8-b58f-080027b3a6c0
creation time: 2018-08-12T07:51:18Z
service account name: spark
volumes: spark-token-rzrgk
node name: minikube
start time: N/A
container images: N/A
phase: Pending
status: []
2018-08-12 15:51:18 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: spark-pi-7314d819cd3730b4bf7d02bfedd21373-driver
namespace: default
labels: spark-app-selector -> spark-8be4d909d85148bc9f1f91d511c275c6, spark-role -> driver
pod uid: 7f6dd84d-9e04-11e8-b58f-080027b3a6c0
creation time: 2018-08-12T07:51:18Z
service account name: spark
volumes: spark-token-rzrgk
node name: minikube
start time: 2018-08-12T07:51:18Z
container images: rongxiang1986/spark:2.3.0
phase: Pending
status: [ContainerStatus(containerID=null, image=rongxiang1986/spark:2.3.0, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})]
2018-08-12 15:51:18 INFO Client:54 - Waiting for application spark-pi to finish...
2018-08-12 15:51:51 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: spark-pi-7314d819cd3730b4bf7d02bfedd21373-driver
namespace: default
labels: spark-app-selector -> spark-8be4d909d85148bc9f1f91d511c275c6, spark-role -> driver
pod uid: 7f6dd84d-9e04-11e8-b58f-080027b3a6c0
creation time: 2018-08-12T07:51:18Z
service account name: spark
volumes: spark-token-rzrgk
node name: minikube
start time: 2018-08-12T07:51:18Z
container images: rongxiang1986/spark:2.3.0
phase: Running
status: [ContainerStatus(containerID=docker://d43089c8340affc4534f796b94a90ae080670c36c095176575fbeebacaab648e, image=rongxiang1986/spark:2.3.0, imageID=docker-pullable://rongxiang1986/spark@sha256:3e93a2d462679015a9fb7d723f53ab1d62c5e3619e3f1564d182c3d297ddf75d, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2018-08-12T07:51:51Z, additionalProperties={}), additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})]
2018-08-12 15:51:57 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state:
pod name: spark-pi-7314d819cd3730b4bf7d02bfedd21373-driver
namespace: default
labels: spark-app-selector -> spark-8be4d909d85148bc9f1f91d511c275c6, spark-role -> driver
pod uid: 7f6dd84d-9e04-11e8-b58f-080027b3a6c0
creation time: 2018-08-12T07:51:18Z
service account name: spark
volumes: spark-token-rzrgk
node name: minikube
start time: 2018-08-12T07:51:18Z
container images: rongxiang1986/spark:2.3.0
phase: Succeeded
status: [ContainerStatus(containerID=docker://d43089c8340affc4534f796b94a90ae080670c36c095176575fbeebacaab648e, image=rongxiang1986/spark:2.3.0, imageID=docker-pullable://rongxiang1986/spark@sha256:3e93a2d462679015a9fb7d723f53ab1d62c5e3619e3f1564d182c3d297ddf75d, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=docker://d43089c8340affc4534f796b94a90ae080670c36c095176575fbeebacaab648e, exitCode=0, finishedAt=Time(time=2018-08-12T07:51:57Z, additionalProperties={}), message=null, reason=Completed, signal=null, startedAt=Time(time=2018-08-12T07:51:51Z, additionalProperties={}), additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})]
2018-08-12 15:51:57 INFO LoggingPodStatusWatcherImpl:54 - Container final statuses:


Container name: spark-kubernetes-driver
Container image: rongxiang1986/spark:2.3.0
Container state: Terminated
Exit code: 0
2018-08-12 15:51:57 INFO Client:54 - Application spark-pi finished.
2018-08-12 15:51:57 INFO ShutdownHookManager:54 - Shutdown hook called
2018-08-12 15:51:57 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-6dd1c204-4ad7-40c4-b47f-a34f18e1995d

2.2 日志查询

可以通过命令查看容器执行日志,或者通过kubernetes-dashboard提供web界面查看。

1
# kubectl logs spark-pi-709e1c1b19813e7cbc1aeff45200c64e-driver
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
2018-08-12 07:51:57 INFO  DAGScheduler:54 - Job 0 finished: reduce at SparkPi.scala:38, took 0.576528 s
Pi is roughly 3.1336756683783418
2018-08-12 07:51:57 INFO AbstractConnector:318 - Stopped Spark@9635fa{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-08-12 07:51:57 INFO SparkUI:54 - Stopped Spark web UI at http://spark-pi-7314d819cd3730b4bf7d02bfedd21373-driver-svc.default.svc:4040
2018-08-12 07:51:57 INFO KubernetesClusterSchedulerBackend:54 - Shutting down all executors
2018-08-12 07:51:57 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint:54 - Asking each executor to shut down
2018-08-12 07:51:57 INFO KubernetesClusterSchedulerBackend:54 - Closing kubernetes client
2018-08-12 07:51:57 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2018-08-12 07:51:57 INFO MemoryStore:54 - MemoryStore cleared
2018-08-12 07:51:57 INFO BlockManager:54 - BlockManager stopped
2018-08-12 07:51:57 INFO BlockManagerMaster:54 - BlockManagerMaster stopped
2018-08-12 07:51:57 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2018-08-12 07:51:57 INFO SparkContext:54 - Successfully stopped SparkContext
2018-08-12 07:51:57 INFO ShutdownHookManager:54 - Shutdown hook called
2018-08-12 07:51:57 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-435d5ab2-f7b4-45d0-a00f-0bd9f162f9db

执行结束后executor pod被自动清除。计算得到pi的值为:

1
Pi is roughly 3.1336756683783418

如果作业通过cluster提交,driver容器会被保留,可以查看:

1
2
3
4
5
6
7
8
9
# minikube service list
|-------------|------------------------------------------------------|-----------------------------|
| NAMESPACE | NAME | URL |
|-------------|------------------------------------------------------|-----------------------------|
| default | kubernetes | No node port |
| default | spark-pi-27fcc168740e372292b27185d124ad7b-driver-svc | No node port |
| kube-system | kube-dns | No node port |
| kube-system | kubernetes-dashboard | http://192.168.99.100:30000 |
|-------------|------------------------------------------------------|-----------------------------|

第三部分 常见报错异常处理

1、如果遇到下面的报错信息,可能是Spark版本太低,建议升级大于2.4.5+以上版本后重试。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
2020-08-09 10:13:14 WARN KubernetesClusterManager:66 - The executor's init-container config map is not specified. Executors will therefore not attempt to fetch remote or submitted dependencies.
2020-08-09 10:13:14 WARN KubernetesClusterManager:66 - The executor's init-container config map key is not specified. Executors will therefore not attempt to fetch remote or submitted dependencies.
2020-08-09 10:13:15 WARN WatchConnectionManager:185 - Exec Failure: HTTP 403, Status: 403 -
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-08-09 10:13:15 ERROR SparkContext:91 - Error initializing SparkContext.
io.fabric8.kubernetes.client.KubernetesClientException:
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:188)
at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:185)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

2、如果发现拉取镜像的比较慢,或者任务状态一直停留也pulling。这是由于minikube主机中本地是没有这个私有镜像,需要进入minikube组件提前将dockerhub中的远程镜像拉取到本地。否则每次提交任务,都会从远程拉取,由于网络条件的限制(你懂的),导致每次拉取都很慢或者超时。

1
2
3
4
5
6
7
8
9
10
11
root@deeplearning:~# minikube ssh
_ _
_ _ ( ) ( )
___ ___ (_) ___ (_)| |/') _ _ | |_ __
/' _ ` _ `\| |/' _ `\| || , < ( ) ( )| '_`\ /'__`\
| ( ) ( ) || || ( ) || || |\`\ | (_) || |_) )( ___/
(_) (_) (_)(_)(_) (_)(_)(_) (_)`\___/'(_,__/'`\____)

$ docker images
#显示本地没有,然后手动提前拉取到本地
$ docker pull rongxiang1986/spark:2.4.6

2、spark 任务每个容器至少需要一个CPU,如果启动的minikube集群的资源是默认的2CPU,如果单个任务申请多个执行器就会报资源不足。所以在创建minikube集群时,提前分配足够的资源。

参考文献

1、Running Spark on Kubernetes :https://spark.apache.org/docs/latest/running-on-kubernetes.html

2、在Minikube Kubernetes集群上运行Spark工作:https://iamninad.com/running-spark-job-on-kubernetes-minikube/

本文标题:在Minikube上运行Spark集群

文章作者:rong xiang

发布时间:2018年06月25日 - 19:06

最后更新:2022年10月25日 - 23:10

原始链接:https://zjrongxiang.github.io/posts/8ee13721/

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。

0%