Tdengine3.3.6.1在k8s部署后,在终端通过taosd启动后,6030 6041 6043 6044这些端口就自己被强制停掉了.再次通过taosd启动,报内存不足或文件锁获取不到

【TDengine 使用环境】
生产环境 /测试/ Poc/预生产环境

目前在测试环境k8s上面进行试用部署

【TDengine 版本】

3.3.6.1

【操作系统以及版本】

k8s容器部署

【部署方式】容器/非容器部署

在k8s通过自己创建或yaml文件部署

【遇到的问题:问题现象及影响】

yaml文件:

kind: Pod
apiVersion: v1
metadata:
name: vt-tdengine-7b97484667-zff8v
generateName: vt-tdengine-7b97484667-
namespace: vt-gwm
labels:
app: vt-tdengine
pod-template-hash: 7b97484667
annotations:
kubesphere.io/creator: ld.zhang
kubesphere.io/imagepullsecrets: ‘{}’
kubesphere.io/restartedAt: ‘2025-11-19T09:13:09.237Z’
logging.kubesphere.io/logsidecar-config: ‘{}’
tke.cloud.tencent.com/networks-status: |-
[{
“name”: “tke-route-eni”,
“interface”: “eth0”,
“ips”: [
“172.21.32.82”
],
“mac”: “92:80:bb:32:f0:a7”,
“default”: true,
“dns”: {}
}]
spec:
volumes:

  • name: tdengine-data
    hostPath:
    path: /data/tdengine
    type: ‘’
  • name: kube-api-access-mlzcd
    projected:
    sources:
  • serviceAccountToken:
    expirationSeconds: 3607
    path: token
  • configMap:
    name: kube-root-ca.crt
    items:
  • key: ca.crt
    path: ca.crt
  • downwardAPI:
    items:
  • path: namespace
    fieldRef:
    apiVersion: v1
    fieldPath: metadata.namespace
    defaultMode: 420
    containers:
  • name: tdengine
    image: ‘harbor.bluesphere.cloud/vt-gwm-tdengine/tdengine:latest’
    ports:
  • name: taosd
    containerPort: 6030
    protocol: TCP
  • name: rest
    containerPort: 6041
    protocol: TCP
  • name: https
    containerPort: 6043
    protocol: TCP
  • name: http
    containerPort: 6044
    protocol: TCP
    env:
  • name: TAOS_CFG_VNODE_NUM
    value: ‘1’
  • name: TAOS_CFG_MAX_VNODE_NUM
    value: ‘2’
  • name: TAOS_CFG_MAX_CONNECTIONS
    value: ‘500’
  • name: TAOS_CFG_HTTP_ENABLE
    value: ‘1’
  • name: TAOS_CFG_CACHE
    value: ‘128’
  • name: TAOS_CFG_BUFFER
    value: ‘256’
  • name: TAOS_CFG_PAGES
    value: ‘128’
    resources:
    limits:
    cpu: ‘2’
    memory: 8Gi
    tke.cloud.tencent.com/eni-ip: ‘1’
    requests:
    cpu: ‘1’
    memory: 4Gi
    tke.cloud.tencent.com/eni-ip: ‘1’
    volumeMounts:
  • name: tdengine-data
    mountPath: /var/lib/taos
  • name: kube-api-access-mlzcd
    readOnly: true
    mountPath: /var/run/secrets/kubernetes.io/serviceaccount
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    imagePullPolicy: IfNotPresent
    restartPolicy: Always
    terminationGracePeriodSeconds: 30
    dnsPolicy: ClusterFirst
    serviceAccountName: default
    serviceAccount: default
    nodeName: 172.21.64.4
    securityContext: {}
    schedulerName: default-scheduler
    tolerations:
  • key: node.kubernetes.io/not-ready
    operator: Exists
    effect: NoExecute
    tolerationSeconds: 300
  • key: node.kubernetes.io/unreachable
    operator: Exists
    effect: NoExecute
    tolerationSeconds: 300
    priority: 0
    enableServiceLinks: true
    preemptionPolicy: PreemptLowerPriority

现象:

通过taosd启动后的命令行打印的:

1/20 09:29:06.343575 00005200 C UTL INFO check global fqdn:localhost and port:6030
11/20 09:29:06.343724 00005200 C UTL INFO total memory size: 16125724KB
11/20 09:29:06.343774 00005200 C UTL INFO memory pool disabled since no enough system available memory after reservied, size: 3845865472
11/20 09:29:06.343786 00005200 C DND INFO start to init dnode env
11/20 09:29:06.344171 00005200 C DND INFO succceed to read dnode file /var/lib/taos/dnode/dnode.json
11/20 09:29:06.344227 00005200 C DND INFO succceed to read mnode file /var/lib/taos/mnode/mnode.json
11/20 09:29:06.344232 00005200 C DND INFO deploy mnode required. option deploy:1
11/20 09:29:06.344237 00005200 C DND INFO file:/var/lib/taos/qnode/qnode.json not exist
11/20 09:29:06.344242 00005200 C DND INFO file:/var/lib/taos/snode/snode.json not exist
11/20 09:29:07.344464 00005200 C DND ERROR failed to lock file:/var/lib/taos/.running since Resource temporarily unavailable, retryTimes:1
11/20 09:29:08.344578 00005200 C DND ERROR failed to lock file:/var/lib/taos/.running since Resource temporarily unavailable, retryTimes:2

容器日志打印的:

0.000000, 0.167261, 0), error:Post “http://localhost:6041/rest/sql/log?req_id=3333272583926713447”: dial tcp 127.0.0.1:6041: connect: connection refused

2025-11-19T20:24:45.963406778+08:00 /usr/bin/entrypoint.sh: line 126: 4354 Killed taoskeeper

2025-11-19T20:26:03.772735029+08:00 /usr/bin/entrypoint.sh: line 135: 4375 Killed taos-explorer

2025-11-19T20:36:45.340809198+08:00 + true

2025-11-19T20:36:45.340834659+08:00 + sleep 1000

2025-11-19T20:53:25.342410662+08:00 + true

2025-11-19T20:53:25.342443905+08:00 + sleep 1000

自己都将这些问题问了AI:按照AI给的解决方案,执行了这些操作:

ps -ef | grep taosd

pkill -9 taosd

ls -l /var/lib/taos/.running

rm -f /var/lib/taos/.running

taosd再次启动后,能正常启动一会儿,然后6030 6041 6044端口又自己停了

tdengine是3.3.6.13,本地把这个镜像拉取下来通过docker都可以正常访问运行,在k8s部署就不行

这里可以看出些端倪,先增加下内存和 VNODE 数量测试下运行情况

11/20 09:29:06.343774 00005200 C UTL INFO memory pool disabled since no enough system available memory after reservied, size: 384586547211/20 09:29:06.343774 00005200 C UTL INFO memory pool disabled since no enough system available memory after reservied, size: 3845865472

tdengine的内存配置多大合适,目前已经增加到配置如下:

resources:
limits:
cpu: ‘2’
memory: 8Gi
tke.cloud.tencent.com/eni-ip: ‘1’
requests:
cpu: ‘1’
memory: 4Gi这个配置了…..还是不行

VNODE数量是这个吗:

TAOS_CFG_VNODE_NUM

   1
  • TAOS_CFG_MAX_VNODE_NUM

    2

  • TAOS_CFG_MAX_CONNECTIONS

    500

  • TAOS_CFG_HTTP_ENABLE

    1

仅运行 taosd, 8G 显然是足够的。但如果依然报错内存不足,显然是 taosd 有运行较大查询操作,或是启动时加载了 wal 数据。可以尝试启动一个干净的 pod 查看运行情况,再压力测试较大查询使用内存情况,最终确定问题或是资源消耗情况。

目前都启动不来,可以给一个完整的k8s,yaml配置不嘛.按照你的配置去配置.我在k8s上面使用的本地存储.之前配置cbs存储tdegnine不支持

请参考 TDengine-Operator/helm/tdengine at 3.0 · taosdata/TDengine-Operator · GitHub