新手指南:k8s集群单机部署

图片来自pixabay.com的designerpoin会员

k8s已经成为业界容器编排技术的平台标准,本文介绍了在单机上如何部署一个k8s集群,同时承担master和worker节点角色,采用flannel网络插件搭建其底层网络模型,部署完毕之后运行一个简单nginx服务。通过部署单机k8s集群,我们可以快速进行相关k8s集群的测试、调试和学习。

本文所安装k8s版本为1.20.0。

k8s集群的单机部署主要有如下步骤,

  1. 检查和配置环境,使之符合k8s所要求。
  2. 安装k8s所需的容器运行时,本文选用了Docker。
  3. 安装k8s集群管理工具kubeadm/kubelet/kubectl。
  4. 通过kubeadm初始化k8s集群。
  5. 通过kubectl部署网络插件flannel。

k8s集群的安装过程不是特别复杂,但是由于国内墙的原因,在获取一些相关容器镜像时,会遇到困难,这主要影响第4步和第5步的k8s集群初始化。本文安装过程中所使用到的镜像列表将在文末给出,供读者参考。博主本人一般先尝试通过阿里云等国内镜像源服务拉取相关镜像,若还是无法拉取,则借助代理服务访问外网镜像源。

在无镜像拉取问题的情况下,整个部署过程预计在1个小时内可以完成。安装过程中可能遇到的一些常见问题和解决方案,在文末将一一列出。

1. 检查当前环境,按要求初始化环境

1.1 安装要求

安装k8s集群的基本要求如下,

  • 至少2核CPU + 2G内存
  • 操作系统版本必须符合如下要求
    • Ubuntu 16.04+
    • Debian 9+
    • CentOS 7+
    • Red Hat Enterprise Linux (RHEL) 7+
    • Fedora 25+
    • HypriotOS v1.0.1+
    • Flatcar Container Linux (使用 2512.3.0 版本测试通过)
  • 集群中的所有机器的网络彼此均能相互连接
  • 节点之中不可以有重复的主机名、MAC 地址或 product_uuid。
  • 查看k8s所需端口,确保这些端口未被防火墙拦截,并检查所需端口在主机上没有被占用。
  • 禁用交换分区。

详细的安装要求见这里

本文安装所使用到的主机配置为,
* 2核CPU + 4G内存
* 操作系统为Ubuntu 20.10,Linux 5.8.0-41-generic
* 单机版网络,单以太网卡接入局域网,未开启防火墙

1.2 查看系统CPU和内存情况

# 查看系统CPU
$ cat /proc/cpuinfo
# 查看系统memory
$ cat /proc/meminfo

1.3 查看系统版本

$ uname -a
Linux k8s-master01 5.8.0-41-generic #46-Ubuntu SMP Mon Jan 18 16:48:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.10
DISTRIB_CODENAME=groovy
DISTRIB_DESCRIPTION="Ubuntu 20.10"

1.4 查看系统Mac和product_uuid

查看系统Mac和product_uuid,在正式环境中,必须保证每个主机节点唯一。

# 查看当前主机的网络适配器和Mac地址
$ ifconfig -a
enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.2.15  netmask 255.255.255.0  broadcast 10.0.2.255
        inet6 fe80::7f65:6807:9e8e:8ac3  prefixlen 64  scopeid 0x20<link>
        ether 08:00:27:52:8d:82  txqueuelen 1000  (Ethernet)
        RX packets 84  bytes 35282 (35.2 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 117  bytes 17453 (17.4 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 150  bytes 12646 (12.6 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 150  bytes 12646 (12.6 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

# 查看enp0s3的mac地址
$ cat /sys/class/net/enp0s3/address 
08:00:27:52:8d:82

# 查看当前主机product_uuid
$ sudo cat /sys/class/dmi/id/product_uuid
975ca0e1-d319-3745-9915-a61ef4297ddf

1.5 查看网卡适配器和路由

查看路由命令,当前主机只有一个网卡适配器,缺省路由0.0.0.0正常指向网关地址。

$ route -v
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         _gateway        0.0.0.0         UG    100    0        0 enp0s3
10.0.2.0        0.0.0.0         255.255.255.0   U     100    0        0 enp0s3
link-local      0.0.0.0         255.255.0.0     U     1000   0        0 enp0s3

若主机上有多个网卡适配器,则需要确认当前k8s组件能够通过缺省路由访问到正确的目标网卡适配器。

1.6 查看netfilter组件

netfilter是一个运行在Linux内核空间、实现网络流量包过滤/地址转换的框架。iptables是一个工作在应用空间、管理netfilter流量规则的应用程序,其通过netfilter回调hook,将网络流量包分类映射到相应的流量规则集合,实现流量管控。

由于k8s集群中容器的IP地址是动态分配的,k8s通过iptables/netfilter控制应用服务流量数据包,实现集群内应用服务流量的动态负载均衡。

如下命令可以加载netfilter组件并允许iptables过滤网桥流量。

# 加载netfilter
$ modprobe br_netfilter

# 确认netfilter的加载情况,若能看到如下的命令输出,则说明netfilter已被加载
$ lsmod | grep br_netfilter
br_netfilter           28672  0
bridge                200704  1 br_netfilter

# 允许iptables过滤网桥流量
$ cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF

$ cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF

$ sudo sysctl --system

1.7 关闭swap

k8s不支持Linux swap功能,这是由于出于集群性能和稳定性考虑,但未来这个问题会被得到解决,可能在1.22这个版本开启swap支持,更多关于这个问题的详细讨论请见:Kubelet/Kubernetes should work with Swap Enabled

关闭swap命令,

# 查看内存中的swap分配情况
$ free -m
              total        used        free      shared  buff/cache   available
Mem:           3932         854         457          15        2620        2783
Swap:          2047           0        2047


# 临时关闭swap
$ sudo swapoff -a 

# 查看内存中的swap分配为0
$ free -m 
              total        used        free      shared  buff/cache   available
Mem:           3932        1265        1074          12        1592        2499
Swap:             0           0           0

# 永久关闭swap(请使用root用户运行如下命令)
$ echo "vm.swappiness=0" >> /etc/sysctl.conf                                                                    
$ sysctl -p /etc/sysctl.conf 

1.8 检查所需端口和主机配置

k8s在master节点和worker节点有不同的端口需求

master节点(控制平面),

  • TCP 6443,Kubernetes API 服务器
  • TCP 2379-2380,etcd 服务器客户端 API
  • TCP 10250,Kubelet API
  • TCP 10251,kube-scheduler
  • TCP 10252,kube-controller-manager

worker节点(工作平面),

  • TCP 10250,Kubelet API
  • TCP 30000-32767,NodePort 服务

在搭建过程中需要确认上述端口没有被占用,并且没有被防火墙拦截。

# 为了方便演示快速部署,建议直接关闭防火墙(若有的话)。
$ systemctl stop firewalld && systemctl disable firewalld

1.9 其它

为了方便安装和演示,建议打开如下三个shell窗口,

  • 第一个shell窗口:切换到root账号下,方便运行需要root权限的系统命令,包括工具安装命令、执行kubeadm命令实现k8s集群的启动和管理等。
  • 第二个shell窗口:查看后台系统日志,执行命令tail -f /var/log/syslog,可以在这里查看到k8s集群启动和运行相关信息。
  • 第三个shell窗口:通过kubectl命令,和k8s控制平面沟通,在k8s集群中运行容器服务。

2. 安装docker作为k8s的容器运行时

k8s可以支持不同的容器运行时,常见的有,

  • containerd
  • CRI-O
  • Docker

本文采用docker作为k8s的容器运行时,如下安装步骤参考了官网资料,但有略微调整,

# 安装curl工具,若已安装可以跳过
$ sudo apt install curl

# Install packages to allow apt to use a repository over HTTPS
$ sudo apt-get update && sudo apt-get install -y \
  apt-transport-https ca-certificates curl software-properties-common gnupg2

# 添加docker apt repository
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key --keyring /etc/apt/trusted.gpg.d/docker.gpg add -
$ sudo add-apt-repository \
  "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) \
  stable"

# 安装docker
$ sudo apt-get update
$ sudo apt-get install docker-ce docker-ce-cli

# 安装完毕之后可以在如下路径查看到docker container runtime,这是k8s的缺省搜索路径
# https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
# kubeadm automatically tries to detect an installed container runtime by scanning through a list of well known Unix domain sockets.
$ ll /var/run/docker.sock

# 配置 Docker daemon,设置cgroupdriver为systemd
# 
sudo mkdir /etc/docker
$ cat <<EOF | sudo tee /etc/docker/daemon.json
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2"
}
EOF

# 重启 docker 后台服务
$ sudo systemctl daemon-reload
$ sudo systemctl restart docker

3. 安装k8s启动管理工具kubelet/kubeadm/kubectl

k8s提供了三个管理工具,用于k8s集群的启动和管理,

  • kubeadm:一个k8s管理工具,通过kubeadm init命令初始化master节点,通过kubeadm join命令部署worker节点。
  • kubelet:一个k8s集群组件,接受k8s控制平面指令,启动和管理当前节点上的k8s pods和containers。
  • kubectl:一个k8s命令行工具,方便和k8s控制平面进行交互,下发指令和查看集群状态。

这三个工具是搭建k8s集群的必要软件,如下为这三个工具软件的安装步骤

# 添加k8s apt国内源(请使用root用户运行如下命令)
$ curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add - 
$ cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF
$ apt-get update

# 安装kubelet/kubeadm/kubectl
$ apt-get install -y kubelet kubeadm kubectl

# 启动kubelet
$ sudo systemctl daemon-reload
$ sudo systemctl restart kubelet

# 查看kubelet日志
$ tail -f /var/log/syslog

注:若在/var/log/syslog的日志中看到如下kubelet服务启动失败信息,可以先不用担心,在k8s master节点初始化之前,出现如下信息是正常的,kubelet会定时尝试连接k8s api server,直到成功。

kubelet[18544]: #011/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/go.opencensus.io/stats/view/worker.go:32 +0x57
systemd[1]: kubelet.service: Main process exited, code=exited, status=255/EXCEPTION
systemd[1]: kubelet.service: Failed with result 'exit-code'.

4. 启动k8s集群

下面将使用kubeadm工具启动k8s集群,启动命令如下,

kubeadm init <args>

在运行如上命令之前,建议做两个事情,

  • 运行kubeadm config images pull查看所需的镜像列表,由于gcr.io在国内被墙的原因,建议把这些镜像通过国内阿里云镜像源先拉取下来,再通过docker tag命令标记到gcr.io本地仓库下。详细步骤见4.1。
  • 考虑k8s service和pod的网段划分,避免和主机节点的网段冲突,本文在安装过程中设计的三个网段划分如下,
    • 主机节点网段:10.0.2.0/8
    • k8s service网段:10.1.0.0/16
    • k8s pod网段:10.244.0.0/16

4.1 获取k8s集群所需镜像

如下脚本可以方便获取启动k8s集群所需的镜像,

# kube-adm-images.sh
# 如下镜像列表和版本,请运行kubeadm config images list命令获取
# 
images=(
    kube-apiserver:v1.20.2
    kube-controller-manager:v1.20.2
    kube-scheduler:v1.20.2
    kube-proxy:v1.20.2
    pause:3.2
    etcd:3.4.13-0
    coredns:1.7.0
)

for imageName in ${images[@]} ; do
    docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/$imageName
    docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/$imageName k8s.gcr.io/$imageName
done

如下为本地拉取的镜像列表,

# 获取k8s集群控制平面所需的镜像
$ kubeadm config images list
19385 version.go:101] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get "https://storage.googleapis.com/kubernetes-release/release/stable-1.txt": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
19385 version.go:102] falling back to the local client version: v1.20.0
k8s.gcr.io/kube-apiserver:v1.20.0
k8s.gcr.io/kube-controller-manager:v1.20.0
k8s.gcr.io/kube-scheduler:v1.20.0
k8s.gcr.io/kube-proxy:v1.20.0
k8s.gcr.io/pause:3.2
k8s.gcr.io/etcd:3.4.13-0
k8s.gcr.io/coredns:1.7.0

# 查看本地镜像
$ docker images list
docker images
REPOSITORY                                                                    TAG        IMAGE ID       CREATED         SIZE
k8s.gcr.io/kube-proxy                                                         v1.20.0    10cc881966cf   8 weeks ago     118MB
k8s.gcr.io/kube-controller-manager                                            v1.20.0    b9fa1895dcaa   8 weeks ago     116MB
k8s.gcr.io/kube-scheduler                                                     v1.20.0    3138b6e3d471   8 weeks ago     46.4MB
k8s.gcr.io/kube-apiserver                                                     v1.20.0    ca9843d3b545   8 weeks ago     122MB
k8s.gcr.io/etcd                                                               3.4.13-0   0369cf4303ff   5 months ago    253MB
k8s.gcr.io/coredns                                                            1.7.0      bfe3a36ebd25   7 months ago    45.2MB
k8s.gcr.io/pause                                                              3.2        80d28bedfe5d   11 months ago   683kB

4.2 运行kubeadm init命令

接下来就是运行kubeadm init命令来启动k8s集群,注意启动命令中的service-cidr和pod-network-cidr参数设置。

# 主机节点网段:10.0.2.0/8
# k8s service网段:10.1.0.0/16
# k8s pod网段:10.244.0.0/16
# 启动k8s集群
$ sudo kubeadm init \
   --image-repository registry.aliyuncs.com/google_containers \
   --kubernetes-version v1.20.2 \
   --service-cidr=10.1.0.0/16 \
   --pod-network-cidr=10.244.0.0/16

# 如下为完整运行日志
$ sudo kubeadm init \
>    --image-repository registry.aliyuncs.com/google_containers \
>    --kubernetes-version v1.20.2 \
>    --service-cidr=10.1.0.0/16 \
>    --pod-network-cidr=10.244.0.0/16
[init] Using Kubernetes version: v1.20.2
[preflight] Running pre-flight checks
    [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.3. Latest validated version: 19.03
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local michaelk8s-virtualbox] and IPs [10.1.0.1 10.0.2.15]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost michaelk8s-virtualbox] and IPs [10.0.2.15 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost michaelk8s-virtualbox] and IPs [10.0.2.15 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 15.002558 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.20" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node michaelk8s-virtualbox as control-plane by adding the labels "node-role.kubernetes.io/master=''" and "node-role.kubernetes.io/control-plane='' (deprecated)"
[mark-control-plane] Marking the node michaelk8s-virtualbox as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: xxx.xxxx
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 10.0.2.15:6443 --token xxx.xxx --discovery-token-ca-cert-hash sha256:xxxx 

在k8s集群启动之后,可以通过kubectl get pods -A命令查看集群服务列表,

# 配置环境变量KUBECONFIG
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config

# 查看k8s集群状态
$ kubectl cluster-info
Kubernetes control plane is running at https://10.0.2.15:6443
KubeDNS is running at https://10.0.2.15:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

# k8s集群控制平面服务列表,
# 1. k8s api server
# 2. k8s controller manager
# 3. k8s scheduler
# 4. k8s proxy
# 5. etcd
# 6. coredns

# 查看控制平面的服务列表
$ kubectl get pods -n kube-system
NAMESPACE     NAME                                            READY   STATUS    RESTARTS   AGE
kube-system   coredns-7f89b7bc75-685w4                        0/1     Pending   0          3m47s
kube-system   coredns-7f89b7bc75-tcp2b                        0/1     Pending   0          3m47s
kube-system   etcd-michaelk8s-virtualbox                      1/1     Running   0          3m53s
kube-system   kube-apiserver-michaelk8s-virtualbox            1/1     Running   0          3m53s
kube-system   kube-controller-manager-michaelk8s-virtualbox   1/1     Running   0          3m53s
kube-system   kube-proxy-6pvcb                                1/1     Running   0          3m47s
kube-system   kube-scheduler-michaelk8s-virtualbox            1/1     Running   0          3m53s

# 查看集群初始化配置,在正式环境的搭建过程中,可以通过kubeadm config的方式来初始化master/worker节点
$ sudo kubeadm config print init-defaults
$ sudo kubeadm config print join-defaults

这个时候查看kubelet的状态,可以看到已经正常启动。

# 查看kubelet的状态
$ systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running)
       Docs: https://kubernetes.io/docs/home/
   Main PID: 97961 (kubelet)
      Tasks: 17 (limit: 4650)
     Memory: 68.7M
     CGroup: /system.slice/kubelet.service
             └─97961 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubel>

但是若查看/var/log/syslog日志,仍然可以看到如下cni config uninitialized的异常信息,

$ tail -f /var/log/syslog
kubelet[22127]: 22127 cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.d
kubelet[22127]: 22127 kubelet.go:2163] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

其告知cni网络插件未配置,这是下一步安装工作需要做的事情。

5. 安装k8s网络插件Flannel

Flannel是一个k8s cni插件,在L3层可以通过VXLAN+UDP技术建立了一个覆盖网络。其安装步骤非常简单,直接运行kubectl apply命令即可,但是由于墙的原因,建议先把如下的镜像下载到本地。

docker images 
REPOSITORY               TAG        IMAGE ID       CREATED         SIZE
quay.io/coreos/flannel   v0.13.0    e708f4bb69e3   3 months ago    57.2MB

然后执行如下安装命令,

# 下载kube-flannel.yml文件,部署flannel网络插件
# 文件地址:https://gitee.com/pphh/blog/blob/master/210215_k8s_deployment/kube-flannel.yml
$ kubectl apply -f kube-flannel.yml

# 安装完毕之后,检查如下cni配置文件,若能找到则说明安装完成
$ cat /etc/cni/net.d/10-flannel.conflist 
{
  "name": "cbr0",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "flannel",
      "delegate": {
        "hairpinMode": true,
        "isDefaultGateway": true
      }
    },
    {
      "type": "portmap",
      "capabilities": {
        "portMappings": true
      }
    }
  ]
}

# 查看flannel服务
$ kubectl get pods -A | grep flannel
NAMESPACE     NAME                                            READY   STATUS    RESTARTS   AGE
kube-system   kube-flannel-ds-2ntmg                           1/1     Running   0          54m

# 查看日志,
$ tail -f /var/log/syslog
Joining mDNS multicast group on interface flannel.1.IPv6 with address fe80::801e:edff:fece:975c.
New relevant interface flannel.1.IPv6 for mDNS.
Registering new address record for fe80::801e:edff:fece:975c on flannel.1.*.

6. 启动一个nginx服务

6.1 设置master节点可以运行pod

在默认情况下,出于安全原因,在master节点上不允许运行调度pod。在本例子中,为了方便部署可运行的单节点k8s集群,将关闭这个限制。

$ kubectl taint nodes --all node-role.kubernetes.io/master-
node/michaelk8s-virtualbox untainted

6.2 启动一个nginx演示程序

# 下载k8s-deployment-nginx.yml文件,启动nginx
# 文件地址:https://gitee.com/pphh/blog/blob/master/210215_k8s_deployment/k8s-deployment-nginx.yml
$ kubectl apply -f k8s-deployment-nginx.yml

# 下载k8s-deployment-nginx-svc.yml文件,启动nginx服务
# 文件地址:https://gitee.com/pphh/blog/blob/master/210215_k8s_deployment/k8s-deployment-nginx-svc.yml
$ kubectl apply -f k8s-deployment-nginx-svc.yml

# 查看nginx状态
$ kubectl get pods -a

$ 查看服务运行情况
$ kubectl get svc
NAME            TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)          AGE
kubernetes      ClusterIP   10.1.0.1     <none>        443/TCP          70m
nginx-service   NodePort    10.1.8.186   <none>        8000:32000/TCP   8s

打开浏览器访问如下地址,

  • http://10.1.8.186:8000/

可以看到welcome to nginx的欢迎信息,

8. 安装过程中遇到的一些问题

8.1 无法查看docker镜像列表

执行docker images命令时,告知permission denied的异常,

$ docker images
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.24/images/json: dial unix /var/run/docker.sock: connect: permission denied

解决方案:切换到root账号下执行docker images命令即可。

8.2 拉取docker镜像问题

本文安装的k8s集群所用到的镜像如下,

$ sudo su -
root@michaelk8s-VirtualBox:~# docker images
REPOSITORY                                                                    TAG        IMAGE ID       CREATED         SIZE
nginx                                                                         latest     f6d0b4767a6c   3 weeks ago     133MB
k8s.gcr.io/kube-proxy                                                         v1.20.0    10cc881966cf   8 weeks ago     118MB
k8s.gcr.io/kube-scheduler                                                     v1.20.0    3138b6e3d471   8 weeks ago     46.4MB
k8s.gcr.io/kube-apiserver                                                     v1.20.0    ca9843d3b545   8 weeks ago     122MB
k8s.gcr.io/kube-controller-manager                                            v1.20.0    b9fa1895dcaa   8 weeks ago     116MB
quay.io/coreos/flannel                                                        v0.13.0    e708f4bb69e3   3 months ago    57.2MB
k8s.gcr.io/etcd                                                               3.4.13-0   0369cf4303ff   5 months ago    253MB
k8s.gcr.io/coredns                                                            1.7.0      bfe3a36ebd25   7 months ago    45.2MB
k8s.gcr.io/pause                                                              3.2        80d28bedfe5d   11 months ago   683kB
nginx                                                                         latest     f6d0b4767a6c   3 weeks ago     133MB

建议先尝试通过阿里云等国内镜像源服务拉取相关镜像,源地址有,

  • registry.aliyuncs.com/google_containers/kube-proxy
  • registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy

若还是无法拉取,则考虑借助代理服务,访问外网镜像源。

8.3 k8s服务状态Init:ErrImagePull

查看k8s服务,有时候可以看到pod的状态一直为Init:ErrImagePull,这也是由于镜像无法拉取到本地导致,尝试切换镜像源。

# kubectl get pods -A
NAMESPACE     NAME                                            READY   STATUS    RESTARTS   AGE
kube-system   kube-flannel-ds-2ntmg                           0/1     Init:ErrImagePull   0          4m51s

8.4 k8s集群初始化时报yaml配置文件已经存在

在主机上第二次运行kubeadm init命令时,会告知yaml配置文件已经存在的错误信息。

$ kubeadm init
[init] Using Kubernetes version: v1.20.2
[preflight] Running pre-flight checks
    [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.3. Latest validated version: 19.03
error execution phase preflight: [preflight] Some fatal errors occurred:
    [ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
    [ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
    [ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
    [ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
    [ERROR Swap]: running with swap on is not supported. Please disable swap
    [ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

解决方案:运行kubeadm reset重置本地配置,然后再次执行kubeadm init命令。

kubeadm启动k8s集群还有其它的报错信息,比如,

[ERROR Swap]: running with swap on is not supported. Please disable swap.

这个根据日志提示,关闭swap即可。

或者希望跳过该错误提示,可以对kubeadm/kubelet分别做如下配置,

  • kubeadm的启动命令中指定--ignore-preflight-errors=Swap
  • kubelet的配置文件中添加配置failSwapOn: False

8.5 系统日志报无网络插件Unable to update cni config

在安装了kubelet之后,通过sudo systemctl restart kubelet启动,但是发现服务没有正常启动,查看系统日志,可以看到如下信息,

5644 cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.d
5644 kubelet.go:2163] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

在安装Flannel插件之前,这个是正常报错信息,安装之后该报错信息会消失。

8.6 运行kubectl命令报连接server localhost:8080被拒绝

异常信息如下,

$ kubectl describe pod
The connection to the server localhost:8080 was refused - did you specify the right host or port?

解决方案:这是大多数原因是由于未进行kubectl的环境配置,请按照如下方法二选一进行配置,然后执行kubectl即可解决。

# 方法一:使用配置文件
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config

# 方法二:使用环境变量
$ export KUBECONFIG=/etc/kubernetes/admin.conf

8.7 其它

更多k8s安装问题请见这里

9. k8s集群组件部署概图

10. 参考资料

  1. k8s工具安装指南
  2. 通过kubeadm初始化k8s集群
  3. Kubelet/Kubernetes should work with Swap Enabled
  4. k8s容器运行时
  5. firewalling, NAT, and packet mangling for linux
  6. k8s高可用拓扑信息
  7. CNI技术规格文档
  8. flannel介绍