kubenetes 集群常用查看和故障排查命令(持续更新) – 21运维
通知: .-...

kubenetes 集群常用查看和故障排查命令(持续更新)

K8S 21运维 2044浏览

有关基础查看、排查和操作的命令,除web ui以外,完全可以通过kubectl命令。 可以通过kubectl –help 进行,或者自命令 比如kubectl  create  –help查看,命令操作基本还是要多用多看帮助就熟练了。

这里记录一下常用的一些查看和操作的命令,部分做了中文翻译,不过还是建议看英文。
Basic Commands (Beginner):
create      从文件或stdin创建资源 
expose     为deployment,pod创建Service。
run           Run a particular image on the cluster 老版本在用,未来会被弃用,官方建议使用create 参数较多,可以考了 kubectl run  –help查看
set           更新resource ,比如更新env环境变量,image,resources 资源限制,selector  subject等。

Basic Commands (Intermediate):
get       最基本的对象查询命令。如 kubectl get nodes/pods/deploy/rs/ns/secret等等, 加-o wide查看详细信息,-o yaml 或-o json 输出具体格式。 
explain  查看资源定义(文档)。如 kubectl explain replicaset
edit      使用系统编辑器编辑资源,完成对象的更新。如 kubectl edit deploy/foo
delete   删除指定资源,支持文件名、资源名、label selector。

Deploy Commands:
rollout      Deployment, Daemonset的升级过程管理(查看状态status、操作历史history、暂停升级、恢复升级、回滚等)
scale       修改Deployment, ReplicaSet, ReplicationController, Job的实例数,实现一个副本集的手工扩展。
autoscale 为Deploy, RS, RC配置自动伸缩规则(依赖heapster和hpa)

Cluster Management Commands:
certificate     Modify certificate resources.
cluster-info   查看集群信息
top               查看资源占用率(依赖heapster)
cordon          标记节点为unschedulable
uncordon       标记节点为schedulable
drain             驱逐节点上的应用,准备下线维护
taint              修改节点taint标记

Troubleshooting and Debugging Commands(故障排查和调试命令):
describe     查看资源详情 
logs           查看pod内容器的日志
attach        Attach到pod内的一个容器
exec          在指定容器内执行命令
port-forward 为pod创建本地端口映射
proxy           为Kubernetes API server创建代理
cp               容器内外/容器间文件拷贝
auth             Inspect authorization

Advanced Commands:
apply       从文件或stdin创建/更新资源
patch       使用strategic merge patch语法更新对象的某些字段
replace     从文件或stdin更新资源
convert      在不同API版本之间转换对象定义

Settings Commands:
label         给资源设置label
annotate    给资源设置annotation
completion 获取shell自动补全脚本(支持bash和zsh)

Other Commands:
api-versions      Print the supported API versions on the server, in the form of “group/version”
api-resources    Print the supported API resources on the server

config               修改kubectl配置(kubeconfig文件),如context
help                   Help about any command
version               查看客户端和Server端K8S版本

一,kubectl实用技巧(网上找到的)

1,查看资源缩写

kubectl describe 或者 kubectl  api-resources

2,配置kubectl自动完成

source < (kubectl completion bash)

3,kubectl写yaml太累,找样例太麻烦?
用run命令生成

kubectl run --image=nginx my-deploy -o yaml --dry-run &gt; my-deploy.yaml

4,用get命令导出

kubectl get statefulset/foo -o=yaml --export &gt; new.yaml

二,常用查看和故障排查命令
1,检查集群是否正常

[[email protected] ~]# kubectl get cs
NAME                 STATUS    MESSAGE              ERROR
scheduler            Healthy   ok                   
controller-manager   Healthy   ok                   
etcd-1               Healthy   {"health": "true"}   
etcd-0               Healthy   {"health": "true"}   
etcd-2               Healthy   {"health": "true"} 

如果某一个不正常,请查看对应日志。比如etcd集群时不时提示unhealthy,请查看etcd节点日志信息,假如系统负载过高,会导致心跳检测失败。

2,检查master状态是否正常

[[email protected] ~] kubectl cluster-info
Kubernetes master is running at https://10.1.14.21:6443
Heapster is running at https://10.1.14.25:6443/api/v1/namespaces/kube-system/services/heapster/proxy
KubeDNS is running at https://10.1.14.25:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
monitoring-grafana is running at https://10.1.14.21:6443/api/v1/namespaces/kube-system/services/monitoring-grafana/proxy
monitoring-influxdb is running at https://192.168.20.134:6443/api/v1/namespaces/kube-system/services/monitoring-influxdb/proxy

 

3,通过kubectl run 创建一组pods nginx,这个命令会启动创建deploy 以及rs 。 这里以nginx1.10为例进行演示

kubectl   run  nginx  --image=nginx:1.10 --port=80 --labels="app=nginx1.10"  --replicas=2

查看执行结果:

[[email protected] yaml]# kubectl  get pods
NAME                     READY   STATUS              RESTARTS   AGE
nginx-64f9d8b667-grtql   0/1     ContainerCreating   0          50s
nginx-64f9d8b667-vbhkh   0/1     ContainerCreating   0          49s

这里一般会配合dscribe 命令查看pods创建状态,可以看到每个pod分配到哪个node以及进度;如果失败,会记录失败原因。


4,kubectl get
查看resource资源,比如常用的nodes/pods/replicasets/services/endpoints/deployments/namespaces等等

   使用kubectl get xxx ,如果要查看详细输出,后边可以加-o wide参数; -o yaml 或-o json 输出具体格式。 。

[[email protected] ~]# kubectl   get pods  -n kube-system -o wide  
NAME                        READY   STATUS    RESTARTS   AGE   IP            NODE         NOMINATED NODE
kube-dns-79fbb66f55-5xvlq   3/3     Running   3          21h   172.50.70.5   10.1.14.25   


5,kubectl describe xxx
支持多数resources,比如node/pod/ns/deployment/rs/rc/svc等等

获取resource的详细信息,一般用于资源的详细参数,pods 无法正常启动的排查,查看报错日志,比如images pull失败,加载参数报错等等。
这里简单记录下刚才创建pod:

[[email protected] yaml]# kubectl  describe  pods nginx-64f9d8b667-grtql
Name:           nginx-64f9d8b667-grtql
Namespace:      default
Node:           10.1.14.26/10.1.14.26
Start Time:     Wed, 28 Nov 2017 23:20:18 -0500
Labels:         app=nginx1.10
                pod-template-hash=64f9d8b667
Annotations:    
Status:         Pending
IP:             
Controlled By:  ReplicaSet/nginx-64f9d8b667
Containers:
  nginx:
    Container ID:   
    Image:          nginx:1.10
    Image ID:       
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-bzjbz (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-bzjbz:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-bzjbz
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  
Tolerations:     
Events:
  Type    Reason     Age   From                 Message
  ----    ------     ----  ----                 -------
  Normal  Scheduled  17s   default-scheduler    Successfully assigned default/nginx-64f9d8b667-grtql to 10.1.14.26
  Normal  Pulling    11s   kubelet, 10.1.14.26  pulling image "nginx:1.10"

可以看到pod nginx-64f9d8b667-grtql被kube-scheduler分配到了node 10.1.14.26节点,目前正在pull nginx:1.10的镜像。如完成会提示创建成功和启动成功。

6,在pod或容器中执行命令 kubectl  exec用法

执行Pod的data命令,默认是用Pod中的第一个容器执行

kubectl exec  pods名称  command

指定Pod中某个容器执行data命令

kubectl exec  pods名称   command

比如:

[[email protected] yaml]# kubectl  exec   nginx-64f9d8b667-grtql cat  /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
172.50.75.2     nginx-64f9d8b667-grtql

通过bash获得Pod中某个容器的TTY,相当于登录容器

kubectl exec -it  pods名称   bash

例子如下:

[[email protected] yaml]# kubectl  exec   -ti nginx-64f9d8b667-grtql bash
[email protected]:/# cat  /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
172.50.75.2     nginx-64f9d8b667-grtql


7,kubectl logs

使用kubectl logs能够取出pod中镜像的log,也是故障排除时候的重要信息

[[email protected] yaml]# kubectl  logs nginx-dbddb74b8-7pwk2
10.254.143.54 - - [28/Nov/2017:03:19:43 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.29.0" "-"
172.50.75.1 - - [28/Nov/2017:03:23:12 +0000] "GET / HTTP/1.1" 200 612 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36" "-"
2017/11/28 03:23:13 [error] 7#7: *3 open() "/usr/share/nginx/html/favicon.ico" failed (2: No such file or directory), client: 172.50.75.1, server: localhost, request: "GET /favicon.ico HTTP/1.1", host: "10.1.14.26:30765", referrer: "http://10.1.14.26:30765/"


8,kubectl expose 创建svc进行端口报漏,方便外部访问。

kubectl  expose  deploy nginx  --port=80  --target-port=80 --type=NodePort --name=nginx-service

查看svc:

[[email protected] yaml]# kubectl  get svc  -o  wide
NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE    SELECTOR
kubernetes      ClusterIP   10.254.0.1               443/TCP        16d    
nginx-service   NodePort    10.254.218.227           80:30680/TCP   119s   app=nginx1.10
[[email protected] yaml]# kubectl  describe   svc   nginx-service
Name:                     nginx-service
Namespace:                default
Labels:                   app=nginx1.10
Annotations:              
Selector:                 app=nginx1.10
Type:                     NodePort
IP:                       10.254.218.227
Port:                       80/TCP
TargetPort:               80/TCP
NodePort:                   30680/TCP
Endpoints:                172.50.70.4:80,172.50.75.2:80
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   

访问测试效果:

[[email protected] ~]# curl   -I 10.254.218.227:80 
HTTP/1.1 200 OK
Server: nginx/1.10.3
Date: Thu, 29 Nov 2017 04:40:48 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 31 Jan 2017 15:01:11 GMT
Connection: keep-alive
ETag: "5890a6b7-264"
Accept-Ranges: bytes
[[email protected] ~]# curl  -I  10.1.14.25:30680
HTTP/1.1 200 OK
Server: nginx/1.10.3
Date: Thu, 29 Nov 2017 04:41:30 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 31 Jan 2017 15:01:11 GMT
Connection: keep-alive
ETag: "5890a6b7-264"
Accept-Ranges: bytes

9,kubectl set或者 kubectl edit 进行pod模版的版本更新,大到环境image更新,小到pod模版的env 设置,request resource资源限制,selector设置,sa 或者subject修改。

比如,我们更新image ,由之前nginx=nginx:1.10更新到nginx:1.11,分别采用kubectl set 或者kubectl edit测试
(1)kubectl set ,具体用法可以kubectl set –help了解:

kubectl  set  image deploy/nginx nginx=nginx:1.11

重新访问,发现nginx的响应头信息已经变成nginx1.11

(2)kubecctl edit 编辑deploy

 kubectl   edit deploy/nginx

进去直接编辑image的镜像版本信息,保存后保存。依然可以实现指定pod image的更新。

10,kubectl rollout 查看版本发布情况以及历史版本信息 以及回滚操作
(1)kubectl rollout status 查看版本发布情况:

[[email protected] ~]# kubectl  rollout status deploy/nginx         
Waiting for deployment "nginx" rollout to finish: 1 out of 2 new replicas have been updated...
Waiting for deployment "nginx" rollout to finish: 1 out of 2 new replicas have been updated...
Waiting for deployment "nginx" rollout to finish: 1 out of 2 new replicas have been updated...
Waiting for deployment "nginx" rollout to finish: 1 old replicas are pending termination...
Waiting for deployment "nginx" rollout to finish: 1 old replicas are pending termination...
deployment "nginx" successfully rolled out

(2)kubectl rollout history 查看版本发布历史,加–revision=x(x为版本号)可以查看具体细节

[[email protected] ~]# kubectl   rollout history  deploy/nginx
deployment.extensions/nginx 
REVISION  CHANGE-CAUSE
2         
3         
4         kubectl set image deploy/nginx nginx=nginx:1.13 --record=true

[[email protected] ~]# kubectl   rollout history  deploy/nginx  --revision=2
deployment.extensions/nginx with revision #2
Pod Template:
  Labels:       app=nginx1.10
        pod-template-hash=d99665758
  Containers:
   nginx:
    Image:      nginx:1.11
    Port:       80/TCP
    Host Port:  0/TCP
    Environment:        
    Mounts:     
  Volumes:      

(3)kubectl rollout undo 撤回上一个版本,–to-revision=x回退到指定版本

kubectl   rollout undo  deployment  nginx  #回滚到上一个版本
kubectl   rollout undo  deployment   nginx  --to-revision=2 #回滚到指定版本

(4)另外kubectl rollout pause/resume 暂停和恢复升级这里不与演示,操作差不多

11,kubectl scale 扩展deploy ,适用于手工扩展deploy情况。

[[email protected] ~]# kubectl   scale  deploy  nginx  --replicas=4
deployment.extensions/nginx scaled
[[email protected]ter01 ~]# kubectl   get pods  -o wide                 
NAME                    READY   STATUS    RESTARTS   AGE     IP            NODE         NOMINATED NODE
nginx-d99665758-5pm47   0/1     Pending   0          1s              10.1.14.25   
nginx-d99665758-qqhhx   1/1     Running   0          2m42s   172.50.51.3   10.1.14.26   
nginx-d99665758-tppvb   0/1     Pending   0          2s              10.1.14.25   
nginx-d99665758-z5dhz   1/1     Running   0          2m51s   172.50.51.2   10.1.14.26   

 

 

 

转载请注明:21运维 » kubenetes 集群常用查看和故障排查命令(持续更新)