涛思内存容量影响可创库的上限

【TDengine 使用环境】
生产环境

【TDengine 版本】

3.2.3.0

【操作系统以及版本】:linux 6.14.0-36-generic #36~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Oct 15 15:45:17 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

【部署方式】容器

【集群节点数】无

【集群副本数】无

【描述业务影响】
涛思官方给出的内存容量上限与实际测试不符,导致可创库数量太少


上图是我这边实测的,4-24mb,可创库数量都是一样的???!!!

再说下我司的使用方式,我司是根据sass的机场创建数据库的,即一个机场一个数据库,所以可数据库创建的数量直接影响sass客户的数量。生产使用过程中发现创建有问题,可用内存还很充足但就提示创不了。创库sql语句报:“No enough memory in然后我研究了下官方文档
,就按照官方文档给出的容量评估公式进行实际测算,发现提示与实际测试严重不符,导致可创库数量太少,32g560个库,严重影响业务开展

不是TDengine有什么隐藏机制,而是你只是创建库,没有实际写入数据。free 命令查看的时候,并不会显示被占用的。
下图是我的虚拟机上的测试情况:8G的内存,每个库1个vgroup,buffer为 1G,当创建到第7个时,显示内存不足,但 free 命令显示可用内存还有 6 G多。

数据库是业务中的核心组件,如有高价值商业化业务,交易购买企业版,确保业务正常开展。

可是我buffer指定为8MB,理论上32g可至少可以创建1600多个的,但实际上创建到560就不行了。自测的时候24m往下都一样,都是最多只能创建560个。这明显有问题吧?

检查一下 supportVnodes 参数设置,以及已创建DB的参数(尤其是cachemodel)

关键:提供报错信息。

supportVnodes我根据文档设置为最大值4096,CACHEMODEL 创库没有指定,看了下使用的是默认值:none

报错信息如下:
01/06 15:14:08.459117 00000089 VND ERROR vgId:1122, failed to open vnode query since Out of Memory
01/06 15:14:08.463555 00000089 DND ERROR vgId:1122, failed to open vnode since Out of Memory
01/06 15:14:08.463570 00000089 VND path:vnode/vnode1122 is removed while destroy vnode
01/06 15:14:08.463575 00000089 UTL tfs remove dir:/var/lib/taos aname:/var/lib/taos/vnode/vnode1122 rname:[vnode/vnode1122]
01/06 15:14:08.464149 00000089 DND ERROR msg:0x76ad981db278, failed to process since Out of Memory, type:create-vnode, gtid:0x0:0x71ed39227ea0ef04
01/06 15:14:08.464222 00000089 DND vgId:1127, vnode management handle msgType:create-vnode, start to create vnode, page:256 pageSize:4 buffer:8 szPage:4096 szBuf:8388608, cacheLast:0 cacheLastSize:1 sstTrigger:1 tsdbPageSize:4 4096 dbname:1.ccc dbId:5560384582142159504, days:14400 keep0:51840 keep1:51840 keep2:51840 keepTimeOffset0 tsma:0 precision:0 compression:2 minRows:100 maxRows:4096, wal fsync:3000 level:1 retentionPeriod:3600 retentionSize:0 rollPeriod:0 segSize:0, hash method:1 begin:2147483647 end:4294967295 prefix:0 surfix:0 replica:1 selfIndex:0 learnerReplica:0 learnerSelfIndex:-1 strict:1 changeVersion:1
01/06 15:14:08.464231 00000089 DND vgId:1127, replica:0 ep:td1:6030 dnode:1
01/06 15:14:08.464983 00000086 MND trans:567, redoAction:0 response is received, code:0x80000102, accept:0x80000521 retry:0x0
01/06 15:14:08.465016 00000086 MND trans:567, continue to execute, stage:redoAction createTime:1767670197752 topHalf:1
01/06 15:14:08.465033 00000086 MND ERROR trans:567, all 2 actions executed, code:0x102
01/06 15:14:08.465037 00000086 MND trans:567, redoAction:0 execute status is reset
01/06 15:14:08.465040 00000086 MND ERROR failed to execute redoActions since:Out of Memory, code:0x80000102
01/06 15:14:08.465048 00000086 MND ERROR trans:567, stage keep on redoAction since Out of Memory, failedTimes:250
01/06 15:14:08.465050 00000086 MND trans:567, send rsp, stage:redoAction failedTimes:250 code:0x80000102
01/06 15:14:08.465544 00000089 DND vgId:1127, alloc disk:0 of level 0. ndisk:1, vnodes: 1121
01/06 15:14:08.465629 00000089 VND vgId:1127, save config while create
01/06 15:14:08.472161 00000089 VND vgId:1127, vnode info is saved, fname:/var/lib/taos/vnode/vnode1127/vnode_tmp.json replica:1 selfIndex:0 changeVersion:1
01/06 15:14:08.472245 00000089 VND vnode info is committed, dir:/var/lib/taos/vnode/vnode1127
01/06 15:14:08.472253 00000089 VND vgId:1127, vnode is created
01/06 15:14:08.541347 00000089 MTA vgId:1127, ttl mgr open end, hash size: 0, time consumed: 5676847 ns
01/06 15:14:08.573922 00000089 TSD vgId:1127 open_fs success
01/06 15:14:08.573952 00000089 TSD vgId:1127 tsdbOpenFS success
01/06 15:14:08.608486 00000089 WAL vgId:1127, reset commitVer to -1

下面这个是批量创建库的shell脚本,你们可以试一下

#!/bin/bash

# TDengine 批量创建数据库脚本
# 使用方法: ./create_td_dbs.sh <数据库数量>

# 配置参数
SERVER_IP="192.168.10.26"
PORT="46041"
BASE_URL="http://${SERVER_IP}:${PORT}/rest/sql"
USERNAME="root"
PASSWORD="taosdata"
PREFIX="c"

# 检查参数
if [ $# -eq 0 ]; then
    echo "错误:请指定要创建的数据库数量"
    echo "使用方法: $0 <数据库数量>"
    echo "示例: $0 10"
    exit 1
fi

COUNT=$1

# 验证输入是否为数字
if ! [[ "$COUNT" =~ ^[0-9]+$ ]]; then
    echo "错误:数据库数量必须是整数"
    exit 1
fi

if [ "$COUNT" -le 0 ]; then
    echo "错误:数据库数量必须大于0"
    exit 1
fi

echo "开始为 TDengine 创建数据库..."
echo "服务器: ${SERVER_IP}:${PORT}"
echo "数据库数量: ${COUNT}"
echo ""

# 生成 Basic Auth 认证头
AUTH_HEADER="Authorization: Basic $(echo -n "${USERNAME}:${PASSWORD}" | base64)"

# 测试连接
echo "正在测试数据库连接..."
TEST_RESPONSE=$(curl -s -X POST "${BASE_URL}" \
    -H "${AUTH_HEADER}" \
    -H "Content-Type: application/x-www-form-urlencoded" \
    -d "SELECT 1" 2>/dev/null)

# 检查响应中的 code 字段(新的API格式)
if echo "$TEST_RESPONSE" | grep -q '"code":0'; then
    echo "连接测试成功!"
else
    echo "错误:无法连接到 TDengine 服务器!"
    echo "响应: $TEST_RESPONSE"
    exit 1
fi

echo ""

# 创建数据库
SUCCESS_COUNT=0
FAILED_COUNT=0

for ((i=1; i<=COUNT; i++)); do
    DB_NAME="${PREFIX}${i}"
    SQL="CREATE DATABASE IF NOT EXISTS ${DB_NAME} KEEP 36 vgroups 2 BUFFER 16"
    
    echo -n "创建数据库: ${DB_NAME} ... "
    
    # 发送 HTTP 请求
    RESPONSE=$(curl -s -X POST "${BASE_URL}" \
        -H "${AUTH_HEADER}" \
        -H "Content-Type: application/x-www-form-urlencoded" \
        -w " HTTP_STATUS:%{http_code}" \
        -d "${SQL}" 2>/dev/null)
    
    # 提取 HTTP 状态码
    HTTP_STATUS=$(echo "$RESPONSE" | tr -d '\n' | sed -e 's/.*HTTP_STATUS://')
    RESPONSE_BODY=$(echo "$RESPONSE" | sed -e 's/HTTP_STATUS:.*//')
    
    # 检查响应
    if [ "$HTTP_STATUS" = "200" ] && echo "$RESPONSE_BODY" | grep -q '"code":0'; then
        echo "✓ 成功"
        SUCCESS_COUNT=$((SUCCESS_COUNT + 1))
    else
        # 尝试从响应中提取错误信息
        ERROR_MSG=$(echo "$RESPONSE_BODY" | grep -o '"message":"[^"]*"' | sed 's/"message":"//' | sed 's/"$//')
        if [ -z "$ERROR_MSG" ]; then
            ERROR_MSG=$(echo "$RESPONSE_BODY" | grep -o '"desc":"[^"]*"' | sed 's/"desc":"//' | sed 's/"$//')
        fi
        if [ -z "$ERROR_MSG" ]; then
            ERROR_MSG="HTTP状态码: $HTTP_STATUS"
        fi
        echo "✗ 失败: ${ERROR_MSG}"
        FAILED_COUNT=$((FAILED_COUNT + 1))
    fi
    
    # 可选:添加短暂延迟避免请求过快
    sleep 0.1
done

# 输出结果汇总
echo ""
echo "=================================================="
echo "创建完成!"
echo "成功: ${SUCCESS_COUNT} 个"
echo "失败: ${FAILED_COUNT} 个"
echo "=================================================="

if [ "$FAILED_COUNT" -eq 0 ]; then
    exit 0
else
    exit 1
fi

supportVnodes我根据文档设置为最大值4096,CACHEMODEL 创库没有指定,看了下使用的是默认值:none

报错信息如下:
01/06 15:14:08.459117 00000089 VND ERROR vgId:1122, failed to open vnode query since Out of Memory
01/06 15:14:08.463555 00000089 DND ERROR vgId:1122, failed to open vnode since Out of Memory
01/06 15:14:08.463570 00000089 VND path:vnode/vnode1122 is removed while destroy vnode
01/06 15:14:08.463575 00000089 UTL tfs remove dir:/var/lib/taos aname:/var/lib/taos/vnode/vnode1122 rname:[vnode/vnode1122]
01/06 15:14:08.464149 00000089 DND ERROR msg:0x76ad981db278, failed to process since Out of Memory, type:create-vnode, gtid:0x0:0x71ed39227ea0ef04
01/06 15:14:08.464222 00000089 DND vgId:1127, vnode management handle msgType:create-vnode, start to create vnode, page:256 pageSize:4 buffer:8 szPage:4096 szBuf:8388608, cacheLast:0 cacheLastSize:1 sstTrigger:1 tsdbPageSize:4 4096 dbname:1.ccc dbId:5560384582142159504, days:14400 keep0:51840 keep1:51840 keep2:51840 keepTimeOffset0 tsma:0 precision:0 compression:2 minRows:100 maxRows:4096, wal fsync:3000 level:1 retentionPeriod:3600 retentionSize:0 rollPeriod:0 segSize:0, hash method:1 begin:2147483647 end:4294967295 prefix:0 surfix:0 replica:1 selfIndex:0 learnerReplica:0 learnerSelfIndex:-1 strict:1 changeVersion:1
01/06 15:14:08.464231 00000089 DND vgId:1127, replica:0 ep:td1:6030 dnode:1
01/06 15:14:08.464983 00000086 MND trans:567, redoAction:0 response is received, code:0x80000102, accept:0x80000521 retry:0x0
01/06 15:14:08.465016 00000086 MND trans:567, continue to execute, stage:redoAction createTime:1767670197752 topHalf:1
01/06 15:14:08.465033 00000086 MND ERROR trans:567, all 2 actions executed, code:0x102
01/06 15:14:08.465037 00000086 MND trans:567, redoAction:0 execute status is reset
01/06 15:14:08.465040 00000086 MND ERROR failed to execute redoActions since:Out of Memory, code:0x80000102
01/06 15:14:08.465048 00000086 MND ERROR trans:567, stage keep on redoAction since Out of Memory, failedTimes:250
01/06 15:14:08.465050 00000086 MND trans:567, send rsp, stage:redoAction failedTimes:250 code:0x80000102
01/06 15:14:08.465544 00000089 DND vgId:1127, alloc disk:0 of level 0. ndisk:1, vnodes: 1121
01/06 15:14:08.465629 00000089 VND vgId:1127, save config while create
01/06 15:14:08.472161 00000089 VND vgId:1127, vnode info is saved, fname:/var/lib/taos/vnode/vnode1127/vnode_tmp.json replica:1 selfIndex:0 changeVersion:1
01/06 15:14:08.472245 00000089 VND vnode info is committed, dir:/var/lib/taos/vnode/vnode1127
01/06 15:14:08.472253 00000089 VND vgId:1127, vnode is created
01/06 15:14:08.541347 00000089 MTA vgId:1127, ttl mgr open end, hash size: 0, time consumed: 5676847 ns
01/06 15:14:08.573922 00000089 TSD vgId:1127 open_fs success
01/06 15:14:08.573952 00000089 TSD vgId:1127 tsdbOpenFS success
01/06 15:14:08.608486 00000089 WAL vgId:1127, reset commitVer to -1

下面这个是批量创建库的shell脚本,你们可以试一下

#!/bin/bash

# TDengine 批量创建数据库脚本
# 使用方法: ./create_td_dbs.sh <数据库数量>

# 配置参数
SERVER_IP="192.168.10.26"
PORT="46041"
BASE_URL="http://${SERVER_IP}:${PORT}/rest/sql"
USERNAME="root"
PASSWORD="taosdata"
PREFIX="c"

# 检查参数
if [ $# -eq 0 ]; then
    echo "错误:请指定要创建的数据库数量"
    echo "使用方法: $0 <数据库数量>"
    echo "示例: $0 10"
    exit 1
fi

COUNT=$1

# 验证输入是否为数字
if ! [[ "$COUNT" =~ ^[0-9]+$ ]]; then
    echo "错误:数据库数量必须是整数"
    exit 1
fi

if [ "$COUNT" -le 0 ]; then
    echo "错误:数据库数量必须大于0"
    exit 1
fi

echo "开始为 TDengine 创建数据库..."
echo "服务器: ${SERVER_IP}:${PORT}"
echo "数据库数量: ${COUNT}"
echo ""

# 生成 Basic Auth 认证头
AUTH_HEADER="Authorization: Basic $(echo -n "${USERNAME}:${PASSWORD}" | base64)"

# 测试连接
echo "正在测试数据库连接..."
TEST_RESPONSE=$(curl -s -X POST "${BASE_URL}" \
    -H "${AUTH_HEADER}" \
    -H "Content-Type: application/x-www-form-urlencoded" \
    -d "SELECT 1" 2>/dev/null)

# 检查响应中的 code 字段(新的API格式)
if echo "$TEST_RESPONSE" | grep -q '"code":0'; then
    echo "连接测试成功!"
else
    echo "错误:无法连接到 TDengine 服务器!"
    echo "响应: $TEST_RESPONSE"
    exit 1
fi

echo ""

# 创建数据库
SUCCESS_COUNT=0
FAILED_COUNT=0

for ((i=1; i<=COUNT; i++)); do
    DB_NAME="${PREFIX}${i}"
    SQL="CREATE DATABASE IF NOT EXISTS ${DB_NAME} KEEP 36 vgroups 2 BUFFER 16"
    
    echo -n "创建数据库: ${DB_NAME} ... "
    
    # 发送 HTTP 请求
    RESPONSE=$(curl -s -X POST "${BASE_URL}" \
        -H "${AUTH_HEADER}" \
        -H "Content-Type: application/x-www-form-urlencoded" \
        -w " HTTP_STATUS:%{http_code}" \
        -d "${SQL}" 2>/dev/null)
    
    # 提取 HTTP 状态码
    HTTP_STATUS=$(echo "$RESPONSE" | tr -d '\n' | sed -e 's/.*HTTP_STATUS://')
    RESPONSE_BODY=$(echo "$RESPONSE" | sed -e 's/HTTP_STATUS:.*//')
    
    # 检查响应
    if [ "$HTTP_STATUS" = "200" ] && echo "$RESPONSE_BODY" | grep -q '"code":0'; then
        echo "✓ 成功"
        SUCCESS_COUNT=$((SUCCESS_COUNT + 1))
    else
        # 尝试从响应中提取错误信息
        ERROR_MSG=$(echo "$RESPONSE_BODY" | grep -o '"message":"[^"]*"' | sed 's/"message":"//' | sed 's/"$//')
        if [ -z "$ERROR_MSG" ]; then
            ERROR_MSG=$(echo "$RESPONSE_BODY" | grep -o '"desc":"[^"]*"' | sed 's/"desc":"//' | sed 's/"$//')
        fi
        if [ -z "$ERROR_MSG" ]; then
            ERROR_MSG="HTTP状态码: $HTTP_STATUS"
        fi
        echo "✗ 失败: ${ERROR_MSG}"
        FAILED_COUNT=$((FAILED_COUNT + 1))
    fi
    
    # 可选:添加短暂延迟避免请求过快
    sleep 0.1
done

# 输出结果汇总
echo ""
echo "=================================================="
echo "创建完成!"
echo "成功: ${SUCCESS_COUNT} 个"
echo "失败: ${FAILED_COUNT} 个"
echo "=================================================="

if [ "$FAILED_COUNT" -eq 0 ]; then
    exit 0
else
    exit 1
fi

研究了很久都没搞清楚

内部有个 timer controllers 的限制,单个 dnode 上 vnode 上限是 1120 个。

你的 560个DB x 2个vgroups,就达到了上限。

也就是说单个节点最大就这么大了,如果要扩容,就需要搞集群是吗?然后集群的理论库数量上限是supportVnodes的最大值,4096 / Vgroups(n) 个吗?

supportVnodes 是单个dnode上最大vnode个数,虽然最大值是4096,但实际被 time controller 这个隐形阀门限制了。

那还有解决办法吗?

为什么不采用在一个DB下面创建多个超级表或子表的方法呢?

可以通过权限管理来控制用户的访问权限。