一些实用工具
可用于转化docker-compose文件,对于初学kubernetes的人很有帮助
安装类工具
参考:
进阶调度
每一种亲和度都有2种语境:preferred,required.preferred表示倾向性,required则是强制.
使用亲和度确保节点在目标节点上运行
1
2
3
4
5
6
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: elasticsearch-test-ready
operator: Exists
参考链接:
使用反亲和度确保每个节点只跑同一个应用
1
2
3
4
5
6
7
8
9
10
11
12
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: 'app'
operator: In
values:
- nginx-test2
topologyKey: "kubernetes.io/hostname"
namespaces:
- test
1
2
3
4
5
6
7
8
9
10
11
12
13
14
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
topologyKey: "kubernetes.io/hostname"
namespaces:
- test
labelSelector:
matchExpressions:
- key: 'app'
operator: In
values:
- "nginx-test2"
tolerations 和 taint
tolerations 和 taint 总是结对存在, taint 就像是”虽然我刁莽,抽烟,月光,但我还是一个好女人”,这种污点(taint)一般会让一般男性(pod)敬而远之,但总有几个老实人能够容忍(tolerations).
taint
1
2
kubectl taint nodes xx elasticsearch-test-ready=true:NoSchedule
kubectl taint nodes xx elasticsearch-test-ready:NoSchedule-
master节点本身就自带taint,所以才会导致我们发布的容器不会在master节点上面跑.但是如果自定义taint
的话就要注意了!所有DaemonSet
和kube-system,都需要带上相应的tolerations
.不然该节点会驱逐所有不带这个tolerations
的容器,甚至包括网络插件,kube-proxy,后果相当严重,请注意
taint
跟tolerations
是结对对应存在的,操作符也不能乱用
tolerations
NoExecute
1
2
3
4
5
tolerations:
- key: "elasticsearch-exclusive"
operator: "Equal"
value: "true"
effect: "NoExecute"
kubectl taint node cn-shenzhen.xxxx elasticsearch-exclusive=true:NoExecute
NoExecute是立刻驱逐不满足容忍条件的pod,该操作非常凶险,请务必先行确认系统组件有对应配置tolerations.
特别注意用Exists
这个操作符是无效的,必须用Equal
NoSchedule
1
2
3
4
5
6
7
8
tolerations:
- key: "elasticsearch-exclusive"
operator: "Exists"
effect: "NoSchedule"
- key: "elasticsearch-exclusive"
operator: "Equal"
value: "true"
effect: "NoExecute"
kubectl taint node cn-shenzhen.xxxx elasticsearch-exclusive=true:NoSchedule
是尽量不往这上面调度,但实际上还是会有pod在那上面跑
Exists
和Exists
随意使用,不是很影响
值得一提的是,同一个key可以同时存在多个effect
1
2
Taints: elasticsearch-exclusive=true:NoExecute
elasticsearch-exclusive=true:NoSchedule
其他参考链接:
容器编排的技巧
wait-for-it
k8s目前没有没有类似docker-compose的depends_on
依赖启动机制,建议使用wait-for-it重写镜像的command.
在cmd中使用双引号的办法
1
2
3
4
5
6
7
- "/bin/sh"
- "-ec"
- |
curl -X POST --connect-timeout 5 -H 'Content-Type: application/json' \
elasticsearch-logs:9200/logs,tracing,tracing-test/_delete_by_query?conflicts=proceed \
-d '{"query":{"range":{"@timestamp":{"lt":"now-90d","format": "epoch_millis"}}}}'
k8s的 master-cluster 架构
master(CONTROL PLANE)
-
etcd distributed persistent storage
Consistent and highly-available key value store used as Kubernetes’ backing store for all cluster data.
-
kube-apiserver
front-end for the Kubernetes control plane.
-
kube-scheduler
Component on the master that watches newly created pods that have no node assigned, and selects a node for them to run on.
- Controller Manager
-
Node Controller
Responsible for noticing and responding when nodes go down.
-
Replication Controller
Responsible for maintaining the correct number of pods for every replication controller object in the system.
-
Endpoints Controller
Populates the Endpoints object (that is, joins Services & Pods).
-
Service Account & Token Controllers
Create default accounts and API access tokens for new namespaces.
-
- cloud-controller-manager(alpha feature)
-
Node Controller
For checking the cloud provider to determine if a node has been deleted in the cloud after it stops responding
-
Route Controller
For setting up routes in the underlying cloud infrastructure
-
Service Controller
For creating, updating and deleting cloud provider load balancers
-
Volume Controller
For creating, attaching, and mounting volumes, and interacting with the cloud provider to orchestrate volumes
-
参考链接:
worker nodes
-
Kubelet
The kubelet is the primary “node agent” that runs on each node.
-
Kubernetes Proxy
kube-proxy enables the Kubernetes service abstraction by maintaining network rules on the host and performing connection forwarding.
-
Container Runtime (Docker, rkt, or others)
The container runtime is the software that is responsible for running containers. Kubernetes supports several runtimes: Docker, rkt, runc and any OCI runtime-spec implementation.
kubernetes的资源
- spec
The spec, which you must provide, describes your desired state for the object–the characteristics that you want the object to have.
- status
The status describes the actual state of the object, and is supplied and updated by the Kubernetes system.
pod
1
2
3
A pod is a group of one or more tightly related containers that will always run together on the same worker node and in the same Linux namespace(s).
Each pod is like a separate logical machine with its own IP, hostname, processes, etc., running a single application.
- liveness
The kubelet uses liveness probes to know when to restart a Container.
- readiness
The kubelet uses readiness probes to know when a Container is ready to start accepting traffic.
- 问题:如果删除一个pod 是先从endpoint里移除pod ip,还是 pod 先删除
个人见解:
删除一个pod的k8s内部流程
- 用户删除pod
- apiserver标记pod为’dead’状态
- kubelet删除pod 默认等待30s还在运行时 会强制关闭pod 3.1 kubelet等待pod中容器的 prestop 执行结束 3.2 发送 sigterm 信号 让容器关闭 3.3 超过30s等待时间 发送 sigkill 信号强制pod关闭
- nodecontroller中的endpoint controller从endpoint中删除此pod
3 4 步骤同时进行 一般情况下4肯定会先于3完成,由于 3 4 顺序不定 极端情况下可能存在 kubelet已经删除了pod,而endpoint controller仍然存在此pod,会导致svc请求会转发到已经删除的pod上,从而导致调用svc出错
参考链接 https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods
参考链接:
Deployment
1
A Deployment controller provides declarative updates for Pods and ReplicaSets.
- Rolling Update
1
2
#只适用于pod 里面只包含一个 container 的情况
kubectl rolling-update NAME [NEW_NAME] --image=IMAGE:TAG
Init Containers 用来作初始化环境的容器
参考:
- Assign CPU Resources to Containers and Pods
- Kubernetes deployment strategies
- Autoscaling based on CPU/Memory in Kubernetes — Part II
- Assigning Pods to Nodes
- 资源不够时deployment无法更新
0/6 nodes are available: 3 Insufficient memory, 3 node(s) had taints that the pod didn’t tolerate.
Replication Controller
1
2
3
A replication controller is a Kubernetes resource that ensures a pod is always up and running.
-> label
ReplicaSet(副本集)
1
Replication Controller(副本控制器)的替代产物
k8s组件 | pod selector |
---|---|
Replication Controller | label |
ReplicaSet | label ,pods that include a certain label key |
参考链接:
DaemonSet(守护进程集)
1
A DaemonSet makes sure it creates as many pods as there are nodes and deploys each one on its own node
- 健康检查
- liveness probe
- HTTP-based liveness probe
StatefulSet(有状态副本集)
1
Manages the deployment and scaling of a set of Pods , and provides guarantees about the ordering and uniqueness of these Pods.
参考:
volumes
volumes有2种模式
In-tree是 Kubernetes 标准版的一部分,已经写入 Kubernetes 代码中。 Out-of-tree 是通过 Flexvolume 接口实现的,Flexvolume 可以使得用户在 Kubernetes 内自己编写驱动或添加自有数据卷的支持。
- emptyDir – a simple empty directory used for storing transient data,
- hostPath – for mounting directories from the worker node’s filesystem into the pod,
- gitRepo – a volume initialized by checking out the contents of a Git repository,
- nfs – an NFS share mounted into the pod,
- gcePersistentDisk (Google Compute Engine Persistent Disk), awsElasticBlockStore (Amazon Web Services Elastic Block Store Volume), azureDisk (Microsoft Azure Disk Volume) – for mounting cloud provider specific storage,
- cinder, cephfs, iscsi, flocker, glusterfs, quobyte, rbd, flexVolume, vsphereVolume, photonPersistentDisk, scaleIO – for mounting other types of network storage,
- configMap, secret, downwardAPI – special types of volumes used to expose certain Kubernetes resources and cluster info to the pod,
- persistentVolumeClaim – a way to use a pre- or dynamically provisioned persistent storage (we’ll talk about them in the last section of this chapter).
-
Persistent Volume 持久卷,就是将数据存储放到对应的外部可靠存储中,然后提供给Pod/容器使用,而无需先将外部存储挂在到主机上再提供给容器。它最大的特点是其生命周期与Pod不关联,在Pod死掉的时候它依然存在,在Pod恢复的时候自动恢复关联。
-
Persistent Volume Claim 用来申明它将从PV或者Storage Class资源里获取某个存储大小的空间。
参考:
ConfigMap
ConfigMap是用来存储配置文件的kubernetes资源对象,所有的配置内容都存储在etcd中.
实践证明修改 ConfigMap 无法更新容器中已注入的环境变量信息。
参考:
service
A Kubernetes service is a resource you create to get a single, constant point of entry to a group of pods providing the same service.
Each service has an IP address and port that never change while the service exists.
The resources will be created in the order they appear in the file. Therefore, it’s best to specify the service first, since that will ensure the scheduler can spread the pods associated with the service as they are created by the controller(s), such as Deployment.
- ClusterIP
集群内部访问用,外部可直接访问
当type不指定时,创建的就是这一类型的服务
clusterIP: None是一种特殊的headless-service,特点是没有clusterIP
- NodePort
每个节点都会开相同的端口,所以叫NodePort.有数量限制.外部可直接访问
- LoadBalancer
特定云产商的服务.如果是阿里云,就是在NodePort的基础上,帮你自动绑定负载均衡的后端服务器而已
- ExternalName
参考:
Horizontal Pod Autoscaler
1
The Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment or replica set based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics).
配合metrics APIs以及resource 里面的 request 资源进行调整.
Kubernetes Downward API
1
It allows us to pass metadata about the pod and its environment through environment variables or files (in a so- called downwardAPI volume)
- environment variables
- downwardAPI volume
Resource Quotas
基于namespace限制pod资源的一种手段
网络模型
参考命令: