在linux下安装Kubernetes

环境 ubuntu18.04 64位

为了解决国内访问一些国外网站慢的问题,本文使用了国内阿里云的镜像。

更换apt包源

这里使用aliyun镜像 https://developer.aliyun.com/mirror/, 为了安全起见,建议备份原来系统默认的 /etc/apt/sources.list 文件

编辑文件 /etc/apt/sources.list,将默认网址 http://archive.ubuntu.com 或 http://cn.archive.ubuntu.com 替换为 http://mirrors.aliyun.com

更新缓存

$ sudo apt-get clean all
$ sudo apt-get update

安装Docker

更改为 aliyun 镜像为 https://developer.aliyun.com/mirror/docker-ce,官方文档 https://docs.docker.com/engine/install/ubuntu/ ,

设置docker仓库

$ sudo apt update
$ sudo apt-get install 
    apt-transport-https 
    ca-certificates 
    curl 
    gnupg 
    lsb-release
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
$ echo 
  "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu 
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

安装 docker engine

$ sudo apt-get update
$ sudo apt-get install docker-ce docker-ce-cli containerd.io

验证docker是否安装成功

$ sudo docker info

查看 docker 版本信息

$ sudo docker version
Client: Docker Engine - Community
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        f0df350
 Built:             Wed Jun  2 11:56:40 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       b0f5bc3
  Built:            Wed Jun  2 11:54:48 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.8
  GitCommit:        7eba5930496d9bbe375fdf71603e610ad737d2b2
 runc:
  Version:          1.0.0
  GitCommit:        v1.0.0-0-g84113ee
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

如果看到以上信息则表示安装成功

添加docker源镜像

由于默认的docker源镜像是国外,国内用户访问太慢,所以我们这里使用aliyun提供的docker镜像源.

sudo mkdir -p /etc/docker
sudo tee /etc/docker/daemon.json <<-'EOF'
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "registry-mirrors": ["https://gfxrbz51.mirror.aliyuncs.com"]
}
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker

查看配置是否生效

$ sudo docker info

由于官方推荐使用 docker cgroup drive 为 systemd,所以这时直接对其进行了修改。

安装 k8s

安装工具 kubectl、 kubelet 和 kubeadm

这里使用aliyun 镜像,参考 https://developer.aliyun.com/mirror/kubernetes

验证安装结果

$ sudo kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.3", GitCommit:"ca643a4d1f7bfe34773c74f79527be4afd95bf39", GitTreeState:"clean", BuildDate:"2021-07-15T21:04:39Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}

$ sudo kubelet --version
Kubernetes v1.21.3

$ sudo kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.3", GitCommit:"ca643a4d1f7bfe34773c74f79527be4afd95bf39", GitTreeState:"clean", BuildDate:"2021-07-15T21:03:28Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}

也可以参考官方教程 https://kubernetes.io/zh/docs/tasks/tools/install-kubectl-linux/

关闭交换分区

$ sudo swapoff -a

建议永久关闭, 在 /etc/fstab 中注释掉 swapfile 那一行

#/swapfile                                 none            swap    sw              0       0

使用 kubeadm 创建k8s集群

参考官方教程 https://kubernetes.io/zh/docs/setup/production-environment/tools/kubeadm/

master 主节点初始化

kubeadm init 
    --apiserver-advertise-address=192.168.0.52 
    --image-repository registry.aliyuncs.com/google_containers 
    --kubernetes-version v1.21.3 
    --pod-network-cidr=10.244.0.0/16

初始化命令说明:

--apiserver-advertise-address

指明用 Master 的哪个 interface 与 Cluster 的其他节点通信。如果 Master 有多个 interface,建议明确指定,如果不指定,kubeadm 会自动选择有默认网关的 interface。

--pod-network-cidr

指定 Pod 网络的范围。Kubernetes 支持多种网络方案,而且不同网络方案对 –pod-network-cidr 有自己的要求,这里设置为 10.244.0.0/16 是因为我们将使用 flannel 网络方案,必须设置成这个 CIDR。

--image-repository

Kubenetes默认repository地址是 k8s.gcr.io,在国内并不能访问 gcr.io,在1.13版本中我们可以增加–image-repository参数,默认值是 k8s.gcr.io,将其指定为阿里云镜像地址:registry.aliyuncs.com/google_containers 。

--kubernetes-version=v1.21.3 

关闭版本探测,因为它的默认值是stable-1,会导致从 https://dl.k8s.io/release/stable-1.txt 下载最新的版本号,我们可以将其指定为固定版本(最新版:v1.21.3)来跳过网络请求。

sxf@sxf-virtual-machine:~$ sudo kubeadm init --image-repository registry.aliyuncs.com/google_containers --kubernetes-version v1.21.3  --pod-network-cidr=10.244.0.0/16

[init] Using Kubernetes version: v1.21.3
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local sxf-virtual-machine] and IPs [10.96.0.1 192.168.3.52]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost sxf-virtual-machine] and IPs [192.168.3.52 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost sxf-virtual-machine] and IPs [192.168.3.52 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 27.506486 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.21" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node sxf-virtual-machine as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node sxf-virtual-machine as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: lw58fm.04ywrp7f8m4q39j6
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.3.52:6443 --token lw58fm.04ywrp7f8m4q39j6 
        --discovery-token-ca-cert-hash sha256:4dd451d1e3e7e0743bfaaf3c01d9ab0d524e1d71521428bccfcd9406ae9860da 

如果遇到 “registry.aliyuncs.com/google_containers/coredns:v1.8.0” 无法下载错误,请参考下方的“常见问题”

初始化过程说明:

[preflight] kubeadm 执行初始化前的检查。
[certs] 生成相关的各种token和证书
[kubeconfig] 生成 KubeConfig 文件,kubelet 需要这个文件与 Master 通信
[kubelet-start] 生成kubelet的配置文件 /var/lib/kubelet/config.yaml
[control-plane] 安装 Master 控制面组件(apiserver、controller-manager和 scheduler),此时会从指定的 Registry 下载组件的 Docker 镜像。每个组件占用一个静态pod,注意区别静态pod和动态pod
[mark-control-plane] 通过添加labels标签,设置当前node节点为控制面,并设置当前节点为污节点,不允许部署pod调度
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory “/etc/kubernetes/manifests”. This can take up to 4m0s
[apiclient] 检查控制面组件是否健康
[bootstrap-token] 生成token记录下来,后边使用kubeadm join往集群中添加节点时会用到
[kubelet-finalize] 指定kubelet客户端证书和密钥
[addons] 安装附加组件 kube-proxy 和 kube-dns。

了解初始化步骤,对理解k8s非常的重要,因此推荐每个开发者都看一下每行输出信息。

现在我们只是创建了 master 节点,默认当前master节点是不可以部署pod的,只有node节点才可以调度pod。

$ kubectl describe node master

...
Taints:             node-role.kubernetes.io/master:NoSchedule
...

对于我们个人学习来说,就一台电脑,只需要一个单机k8s集群即可。

$ sudo kubectl taint nodes --all node-role.kubernetes.io/master-

此时将变为

Taints:             <none>

将node加入集群

kubeadm join 192.168.3.52:6443 --token lw58fm.04ywrp7f8m4q39j6 
        --discovery-token-ca-cert-hash sha256:4dd451d1e3e7e0743bfaaf3c01d9ab0d524e1d71521428bccfcd9406ae9860da 

如果忘记token的话,可以查看

$ sudo kubeadm token list

token 默认都有一个有效期,如果过期的话,可以通过命令

$ sudo kubeadm token create --print-join-command

创建一个新的token,在新的node执行即可加入集群。参考 https://kubernetes.io/zh/docs/reference/setup-tools/kubeadm/kubeadm-token/

查看集群节点状态

$ sudo kubectl get node -o wide

NAME                  STATUS     ROLES                  AGE   VERSION
sxf-virtual-machine   NotReady   control-plane,master   67m   v1.21.3

发现 STATUS 状态为 NotReady ,则表示集群有问题,通过通过查看节点详情了解详细原因

$ sudo kubectl describe node 

...
  RenewTime:       Thu, 29 Jul 2021 15:39:21 +0800
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Thu, 29 Jul 2021 15:37:33 +0800   Thu, 29 Jul 2021 14:31:09 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Thu, 29 Jul 2021 15:37:33 +0800   Thu, 29 Jul 2021 14:31:09 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Thu, 29 Jul 2021 15:37:33 +0800   Thu, 29 Jul 2021 14:31:09 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Thu, 29 Jul 2021 15:37:33 +0800   Thu, 29 Jul 2021 14:31:09 +0800   KubeletNotReady              container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
  ...

主要错误是由于没有安装网络插件引起的,安装一个就可以了。如果你查看pod状态的话,会发现 coredns 的状态一直处于 Penging 状态,也是同样的原因的

$ kubectl get pods -A
NAMESPACE     NAME                             READY   STATUS    RESTARTS   AGE
kube-system   coredns-59d64cd4d4-5wp7k         0/1     Pending   0          86s
kube-system   coredns-59d64cd4d4-c49p6         0/1     Pending   0          86s
kube-system   etcd-ubuntu                      1/1     Running   0          93s
kube-system   kube-apiserver-ubuntu            1/1     Running   0          93s
kube-system   kube-controller-manager-ubuntu   1/1     Running   0          93s
kube-system   kube-proxy-8mpxg                 1/1     Running   0          86s
kube-system   kube-scheduler-ubuntu            1/1     Running   0          93s

安装 pod 网络插件

你必须部署一个基于 Pod 网络插件的 容器网络接口 (CNI),以便你的 Pod 可以相互通信。 在安装网络之前,集群 DNS (CoreDNS) 将不会启动。

常见的网络插件有 FlannelCalicoWeave, 它们之间的区别请参考相关文章。
我们这里选择 Flannel 插件, 它也是最简单。

此插件需要在kubeadm init 时设置 –pod-network-cidr=10.244.0.0/16

$ sudo kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

这时候,会自动下载 quay.io/coreos/flannel:v0.14.0 镜像文件到本机,如果超时的话,可能需要手动下载到本机,再重试安装命令。待安装成功后,会看到有两个 flannel 容器,且刚才处于 Pending 状态的coredns 变为 Running。

# kubectl get pod -A
NAMESPACE     NAME                             READY   STATUS    RESTARTS   AGE
kube-system   coredns-59d64cd4d4-5wp7k         1/1     Running   0          10m
kube-system   coredns-59d64cd4d4-c49p6         1/1     Running   0          10m
kube-system   etcd-ubuntu                      1/1     Running   0          10m
kube-system   kube-apiserver-ubuntu            1/1     Running   0          10m
kube-system   kube-controller-manager-ubuntu   1/1     Running   0          10m
kube-system   kube-flannel-ds-b4gzk            1/1     Running   0          5m53s
kube-system   kube-proxy-8mpxg                 1/1     Running   0          10m
kube-system   kube-scheduler-ubuntu            1/1     Running   0          10m

常见问题

  1. 安装过程中会提示 “registry.aliyuncs.com/google_containers/coredns:v1.8.0” 这个镜像无法下载,解决办法:
$ sudo docker pull coredns/coredns:1.8.0
$ sudo docker tag coredns/coredns:1.8.0 registry.aliyuncs.com/google_containers/coredns:v1.8.0

再次执行初始化命令即可

  1. 安装集群失败,请检查是否工具版本不匹配原因,见 https://kubernetes.io/zh/docs/setup/release/version-skew-policy/#supported-versions
  2. 查看 kubectl 日志
$ journalctl -f -u kubelet
  1. 查看 POD 调度日志
$ sudo kubectl logs kube-flannel-ds-gfhf2 -n kube-system
  1. 集群初始化如果遇到问题,可以使用 kubeadm reset 命令进行清理然后重新执行初始化。
  2. flannel pod 一直处于 CrashLoopBackOff 状态,可通过修改文件
vim /etc/kubernetes/manifests/kube-controller-manager.yaml

在 spec.containers.command 命令后面添加以两个参数

--allocate-node-cidrs=true
--cluster-cidr=10.244.0.0/16
  1. 如果在安装网络插件后,显示 coredns 的POD 处于 STATUS:RUNNING READY:0/1, 可参考方法 https://www.programmersought.com/article/14857690636/ 或 https://stackoverflow.com/questions/60782064/coredns-has-problems-getting-endpoints-services-namespaces
$ iptables -P INPUT ACCEPT
$ kubectl rollout restart deployment coredns --namespace kube-system

重启 kubectl

sudo systemctl restart kubelet

主要原因是我们在 kubeadm init 的时候没有指定 cidr 网络

参考资料