【TDengine 使用环境】
生产环境 /测试/ Poc/预生产环境
【TDengine 版本】 从3.3.6.13升级到3.3.8.8碰到的
【操作系统以及版本】 腾讯云 TencentOS Server 2.4 (tkernel4)
【部署方式】容器,k8s部署
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: "tdengine"
labels:
app: "tdengine"
spec:
serviceName: "taosd"
replicas: 3
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
app: "tdengine"
template:
metadata:
name: "tdengine"
labels:
app: "tdengine"
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: app
operator: In
values:
- tdengine
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- tdengine
topologyKey: kubernetes.io/hostname
imagePullSecrets:
- name: txregistry
containers:
- name: "tdengine"
image: "bjccr.tencentcloudcr.com/middleware/tdengine:3.3.8.8"
imagePullPolicy: "IfNotPresent"
ports:
- name: tcp6030
protocol: "TCP"
containerPort: 6030
- name: tcp6041
protocol: "TCP"
containerPort: 6041
env:
# POD_NAME for FQDN config
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
# SERVICE_NAME and NAMESPACE for fqdn resolve
- name: SERVICE_NAME
value: "taosd"
- name: STS_NAME
value: "tdengine"
- name: STS_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
# TZ for timezone settings, we recommend to always set it.
- name: TZ
value: "Asia/Shanghai"
# Environment variables with prefix TAOS_ will be parsed and converted into corresponding parameter in taos.cfg. For example, serverPort in taos.cfg should be configured by TAOS_SERVER_PORT when using K8S to deploy
- name: TAOS_SERVER_PORT
value: "6030"
# Must set if you want a cluster.
- name: TAOS_FIRST_EP
value: "tdengine-0.$(SERVICE_NAME).$(STS_NAMESPACE).svc.cluster.local"
# TAOS_FQND should always be set in k8s env.
- name: TAOS_FQDN
value: "$(POD_NAME).$(SERVICE_NAME).$(STS_NAMESPACE).svc.cluster.local"
volumeMounts:
- name: taosdata
mountPath: /var/lib/taos
# 配置文件
- name: td-config
mountPath: /etc/taos/explorer.toml
subPath: explorer.toml
- name: td-config
mountPath: /etc/taos/taosadapter.toml
subPath: taosadapter.toml
- name: td-config
mountPath: /etc/taos/taos.cfg
subPath: taos.cfg
- name: td-config
mountPath: /etc/taos/taoskeeper.toml
subPath: taoskeeper.toml
# 日志文件
- name: tdengine-log
mountPath: /var/log/taos
startupProbe:
exec:
command:
- /bin/sh
- -c
- timeout 5s taos -s 'SELECT 1;' >/dev/null 2>&1
failureThreshold: 360
periodSeconds: 10
readinessProbe:
exec:
command:
- /bin/sh
- -c
- timeout 5s taos -s 'SELECT 1;' >/dev/null 2>&1
initialDelaySeconds: 5
timeoutSeconds: 5000
livenessProbe:
exec:
command:
- /bin/sh
- -c
- timeout 5s taos -s 'SELECT 1;' >/dev/null 2>&1
initialDelaySeconds: 15
periodSeconds: 20
volumes:
- name: tdengine-log
hostPath:
path: /opt/platform/services/middleware/tdengine/logs
- name: td-config
configMap:
name: tdengine-config
volumeClaimTemplates:
- metadata:
name: taosdata
spec:
accessModes:
- "ReadWriteOnce"
storageClassName: "cbs"
resources:
requests:
storage: "300Gi"
【集群节点数】3
【集群副本数】3
【描述业务影响】节点无法启动了,只能完全删除数据以后重建集群
【问题复现路径/shan】容器目录达到85%以后发生pod驱逐
【遇到的问题:问题现象及影响】
之前我们碰到一个问题,等一个查询数据较大时,会导致磁盘目录空间飙升,完事我们用命令强行删除进程才能恢复磁盘空间,以下是用到的清理命令
for pid_fd in $(lsof +L1 /opt 2>/dev/null | awk 'NR>1 && /deleted/ {print $2":"$4}' | sed 's/[^0-9:]//g'); do
pid=$(echo $pid_fd | cut -d: -f1)
fd=$(echo $pid_fd | cut -d: -f2)
echo "截断进程 $pid 的文件描述符 $fd"
: > /proc/$pid/fd/$fd
done
最近再把集群从3.3.6.13升级到3.3.8.8 后,在深夜会发生容器磁盘飙升现象,到85%会发生硬驱逐,导致集群直接损坏,再也无法启动了。唯一的恢复手段就是完全pvc数据目录以后才可恢复。
【资源配置】
16C32G
【报错完整截图】(不要大段的粘贴报错代码,论坛直接看报错代码不直观)
麻烦社区的小伙伴在看到以后可以回复一下,有其他疑问也可以直接call我,十分感谢



