【TDengine 使用环境】
生产环境
【TDengine 版本】
3.3.6.13
【TDengine 使用环境】
生产环境
【TDengine 版本】
3.3.6.13
打开日志的debug,当出现问题时,观察日志的输出。
02/07 09:36:07.117971 02584612 C UTL WARN worker:mnode-status,message has been queued for too long, cost: 89s
02/07 09:36:06.578160 02584613 C DND WARN worker:1,message has been processed for too long, type:trans-tmr, cost: 89s, QID:0x0:0x0
02/07 09:36:07.264600 02584616 C DND WARN worker:1,message has been processed for too long, type:mnd-arb-hb-tmr, cost: 9s, QID:0x0:0x0
02/07 09:36:07.978877 02584693 C DND ERROR failed to SendRecv status req with timeout 5000 since Operation timeout
02/07 09:36:06.696353 02584699 C MND INFO vgId:2, state changed by offline check, old state:leader restored:1 canRead:1 new state:error restored:0 canRead:0
02/07 09:36:10.458505 02584613 C UTL WARN worker:mnode-write,message has been queued for too long, cost: 92s
02/07 09:36:10.458498 02584616 C UTL WARN worker:mnode-arb,message has been queued for too long, cost: 9s
02/07 09:36:10.484791 02584699 C MND INFO db:1.eblock, stateTs changed by offline check, old newTs:1770428136537 newTs:1770428163853
02/07 09:36:12.899773 02584612 C MND INFO vgId:2, state changed by status msg, old state:offline restored:0 canRead:0 new state:leader restored:1 canRead:1, dnode:1
02/07 09:36:15.218411 02584699 C MND INFO vgId:3, state changed by offline check, old state:leader restored:1 canRead:1 new state:error restored:0 canRead:0
02/07 09:36:17.094129 02584613 C DND WARN worker:1,message has been processed for too long, type:tmq-tmr, cost: 98s, QID:0x0:0x0
02/07 09:36:18.388296 02584693 C DND ERROR failed to SendRecv status req with timeout 5000 since Operation timeout
02/07 09:36:17.363450 02584612 C MND INFO db:1.eblock, stateTs changed by status msg, old stateTs:1770428163853 new stateTs:1770428172136
02/07 09:36:18.638791 02584616 C DND WARN worker:1,message has been processed for too long, type:mnd-arb-check-sync-tmr, cost: 17s, QID:0x0:0x0
02/07 09:36:19.709389 02584613 C UTL WARN worker:mnode-write,message has been queued for too long, cost: 101s
02/07 09:36:22.433509 02584613 C DND WARN worker:1,message has been processed for too long, type:grant-hb-tmr, cost: 103s, QID:0x0:0x0
02/07 09:36:21.365744 02584612 C MND INFO vgId:3, state changed by status msg, old state:offline restored:0 canRead:0 new state:leader restored:1 canRead:1, dnode:1
02/07 09:36:27.950142 02584613 C UTL WARN worker:mnode-write,message has been queued for too long, cost: 87s
02/07 09:36:29.451272 02584693 C DND ERROR failed to SendRecv status req with timeout 5000 since Operation timeout
02/07 09:36:30.312826 02584612 C MND INFO dnode:1, from offline to online, memory avail:1835900520 total:2058530816 cores:4.00
02/07 09:36:35.400528 02584616 C DND WARN worker:1,message has been processed for too long, type:mnd-arb-hb-tmr, cost: 2s, QID:0x0:0x0
02/07 09:36:36.406309 02584612 C DND WARN worker:1,message has been processed for too long, type:status, cost: 118s, QID:0x0:0x3062035bc6c3052e
02/07 09:36:34.993443 02584578 C RPC WARN DND-S conn:0xffff54001710, vnode-fetch-ttl-expired-tbs received from 127.0.0.1:43978, local info:127.0.0.1:6030, len:24, cost:1493191us, recv exception, seqNum:56436, sid:0, QID:0x0:0x3062035be5830539
02/07 09:36:40.572549 02584693 C DND ERROR failed to SendRecv status req with timeout 5000 since Operation timeout
02/07 09:36:39.134103 02584613 C DND WARN worker:1,message has been processed for too long, type:ttl-tmr, cost: 98s, QID:0x0:0x0
02/07 09:36:40.415300 02584616 C UTL WARN worker:mnode-arb,message has been queued for too long, cost: 7s
02/07 09:36:41.283324 02584612 C UTL WARN worker:mnode-status,message has been queued for too long, cost: 112s
02/07 09:36:43.186745 02584578 C RPC WARN DND-S conn:0xffff54001710, vnode-fetch-ttl-expired-tbs received from 127.0.0.1:43978, local info:127.0.0.1:6030, len:24, cost:5404041us, recv exception, seqNum:56437, sid:0, QID:0x0:0x3062035be653053b
02/07 09:36:42.715290 02584613 C UTL WARN worker:mnode-write,message has been queued for too long, cost: 101s
02/07 09:36:44.457859 02584579 C RPC WARN DND-S conn:0xffff84007ca0, statis received from 127.0.0.1:38252, local info:127.0.0.1:6030, len:1251, cost:3761924us, recv exception, seqNum:741672, sid:0, QID:0x749b035be666b2d9:0x749b035be6c6b2da
02/07 09:36:44.737469 02584616 C DND WARN worker:1,message has been processed for too long, type:mnd-arb-check-sync-tmr, cost: 11s, QID:0x0:0x0
02/07 09:36:46.328561 02584613 C DND WARN worker:1,message has been processed for too long, type:trans-tmr, cost: 105s, QID:0x0:0x0
02/07 09:36:45.036051 02584699 C MND INFO dnode:1, in offline state
02/07 09:36:46.332817 02584578 C RPC WARN DND-S conn:0xffff540332a0, status received from 127.0.0.1:44512, local info:127.0.0.1:6030, len:617, cost:11522869us, recv exception, seqNum:140805, sid:0, QID:0x0:0x3062035be5e3053a
02/07 09:36:47.089767 02584612 C MND INFO dnode:1, from offline to online, memory avail:1835900520 total:2058530816 cores:4.00
02/07 09:36:50.002027 02584699 C MND INFO vgId:2, state changed by offline check, old state:leader restored:1 canRead:1 new state:error restored:0 canRead:0
02/07 09:36:49.830076 02584616 C UTL WARN worker:mnode-arb,message has been queued for too long, cost: 8s
02/07 09:36:50.220923 02584613 C UTL WARN worker:mnode-write,message has been queued for too long, cost: 108s
02/07 09:36:50.332628 02584693 C DND ERROR failed to SendRecv status req with timeout 5000 since Operation timeout
02/07 09:36:52.283564 02584699 C MND INFO db:1.eblock, stateTs changed by offline check, old newTs:1770428172136 newTs:1770428204585
02/07 09:36:54.258264 02584616 C DND WARN worker:1,message has been processed for too long, type:mnd-arb-hb-tmr, cost: 12s, QID:0x0:0x0
02/07 09:36:54.258264 02584613 C DND WARN worker:1,message has been processed for too long, type:compact-tmr, cost: 112s, QID:0x0:0x0
02/07 09:36:53.859878 02584612 C DND WARN worker:1,message has been processed for too long, type:status, cost: 124s, QID:0x0:0x3062035bc963052f
02/07 09:36:54.722294 02584699 C MND INFO vgId:3, state changed by offline check, old state:leader restored:1 canRead:1 new state:error restored:0 canRead:0
02/07 09:36:56.316626 02584578 C RPC WARN DND-S conn:0xffff540332a0, status received from 127.0.0.1:44512, local info:127.0.0.1:6030, len:617, cost:11278697us, recv exception, seqNum:140806, sid:0, QID:0x0:0x3062035be853053c
02/07 09:36:58.291381 02584612 C UTL WARN worker:mnode-status,message has been queued for too long, cost: 122s
02/07 09:36:58.104783 02584613 C UTL WARN worker:mnode-write,message has been queued for too long, cost: 115s
02/07 09:36:58.571178 02584611 C DND WARN worker:1,message has been processed for too long, type:heartbeat, cost: 55s, QID:0x30ee5bbd91cf6e92:0x749b035bd9b6b2d8
02/07 09:37:01.664866 02584693 C DND ERROR failed to SendRecv status req with timeout 5000 since Operation timeout
02/07 09:37:01.953662 02584613 C DND WARN worker:1,message has been processed for too long, type:tmq-tmr, cost: 119s, QID:0x0:0x0
02/07 09:37:02.970421 02584611 C UTL WARN worker:mnode-read,message has been queued for too long, cost: 14s
02/07 09:37:03.706053 02584612 C MND INFO vgId:2, state changed by status msg, old state:offline restored:0 canRead:0 new state:leader restored:1 canRead:1, dnode:1
02/07 09:37:02.789327 02584578 C RPC WARN DND-S conn:0xffff540332a0, status received from 127.0.0.1:44512, local info:127.0.0.1:6030, len:617, cost:6372327us, recv exception, seqNum:140807, sid:0, QID:0x0:0x3062035beb23053d
02/07 09:37:04.995946 02584613 C UTL WARN worker:mnode-write,message has been queued for too long, cost: 122s
02/07 09:37:05.371863 02584612 C MND INFO db:1.eblock, stateTs changed by status msg, old stateTs:1770428204585 new stateTs:1770428223194
02/07 09:37:05.249862 02584611 C DND WARN worker:1,message has been processed for too long, type:statis, cost: 16s, QID:0x749b035be666b2d9:0x749b035be6c6b2da
02/07 09:37:08.227628 02584612 C MND INFO vgId:3, state changed by status msg, old state:offline restored:0 canRead:0 new state:leader restored:1 canRead:1, dnode:1
02/07 09:37:10.990154 02584613 C DND WARN worker:1,message has been processed for too long, type:stream-begin-checkpoint, cost: 128s, QID:0x0:0x0
02/07 09:37:10.688643 02584578 C RPC WARN DND-S conn:0xffff540332a0, status received from 127.0.0.1:44512, local info:127.0.0.1:6030, len:617, cost:2552975us, recv exception, seqNum:140808, sid:0, QID:0x0:0x3062035bede3053e
02/07 09:37:13.388199 02584693 C DND ERROR failed to SendRecv status req with timeout 5000 since Operation timeout
02/07 09:37:13.388199 02584613 C UTL WARN worker:mnode-write,message has been queued for too long, cost: 130s
02/07 09:37:13.917611 02584612 C MND INFO dnode:1, from offline to online, memory avail:1835900520 total:2058530816 cores:4.00
02/07 09:37:15.632490 02584616 C DND WARN worker:1,message has been processed for too long, type:mnd-arb-hb-tmr, cost: 2s, QID:0x0:0x0
02/07 09:37:18.071473 02584616 C UTL WARN worker:mnode-arb,message has been queued for too long, cost: 4s
02/07 09:37:18.277863 02584613 C DND WARN worker:1,message has been processed for too long, type:stream-consen-tmr, cost: 134s, QID:0x0:0x0
02/07 09:37:19.897986 02584699 C MND INFO dnode:1, in offline state
02/07 09:37:20.794418 02584612 C DND WARN worker:1,message has been processed for too long, type:status, cost: 144s, QID:0x0:0x3062035bcc130530
02/07 09:37:21.904949 02584613 C UTL WARN worker:mnode-write,message has been queued for too long, cost: 138s
02/07 09:37:23.437652 02584616 C DND WARN worker:1,message has been processed for too long, type:mnd-arb-check-sync-tmr, cost: 9s, QID:0x0:0x0
02/07 09:37:23.899396 02584699 C MND INFO vgId:2, state changed by offline check, old state:leader restored:1 canRead:1 new state:error restored:0 canRead:0
02/07 09:37:21.969456 02584578 C RPC WARN DND-S conn:0xffff540332a0, status received from 127.0.0.1:44512, local info:127.0.0.1:6030, len:617, cost:1872669us, recv exception, seqNum:140809, sid:0, QID:0x0:0x3062035bf0c3053f
02/07 09:37:26.653830 02584699 C MND INFO db:1.eblock, stateTs changed by offline check, old newTs:1770428223194 newTs:1770428239472
02/07 09:37:25.139665 02584612 C UTL WARN worker:mnode-status,message has been queued for too long, cost: 138s
02/07 09:37:25.678246 02584693 C DND ERROR failed to SendRecv status req with timeout 5000 since Operation timeout
02/07 09:37:27.007706 02584613 C DND WARN worker:1,message has been processed for too long, type:grant-hb-tmr, cost: 142s, QID:0x0:0x0
02/07 09:37:30.081928 02584613 C UTL WARN worker:mnode-write,message has been queued for too long, cost: 138s
02/07 09:37:29.151220 02584699 C MND INFO vgId:3, state changed by offline check, old state:leader restored:1 canRead:1 new state:error restored:0 canRead:0
02/07 09:37:33.852401 02584613 C DND WARN worker:1,message has been processed for too long, type:trans-tmr, cost: 141s, QID:0x0:0x0
02/07 09:37:31.802398 02584612 C MND INFO vgId:2, state changed by status msg, old state:offline restored:0 canRead:0 new state:leader restored:1 canRead:1, dnode:1
02/07 09:37:37.290554 02584613 C UTL WARN worker:mnode-write,message has been queued for too long, cost: 144s
02/07 09:37:36.093034 02584578 C RPC WARN DND-S conn:0xffff540332a0, status received from 127.0.0.1:44512, local info:127.0.0.1:6030, len:617, cost:3333611us, recv exception, seqNum:140810, sid:0, QID:0x0:0x3062035bf3f30540
02/07 09:37:38.030103 02584612 C MND INFO db:1.eblock, stateTs changed by status msg, old stateTs:1770428239472 new stateTs:1770428251394
02/07 09:37:38.660583 02584693 C DND ERROR failed to SendRecv status req with timeout 5000 since Operation timeout
02/07 09:37:41.175351 02584616 C DND WARN worker:1,message has been processed for too long, type:mnd-arb-hb-tmr, cost: 2s, QID:0x0:0x0
02/07 09:37:41.654903 02584613 C DND WARN worker:1,message has been processed for too long, type:tmq-tmr, cost: 148s, QID:0x0:0x0
02/07 09:37:42.307084 02584578 C RPC WARN DND-S conn:0xffff5402e810, heartbeat received from 127.0.0.1:38254, local info:127.0.0.1:6030, len:2880, cost:6906558us, recv exception, seqNum:741361, sid:0, QID:0x30ee5bbf3d356e94:0x749b035bf436b2db
您好,错误日志已经贴在下面,我们放开全部日志再复现一次问题
从上面日志信息看,有底层消息的收发异常,这个应该是导致写入阻塞的直接原因。
这个环境是 多节点的,还是 单节点?
复现后,重点观察一下各个机器之间的网络情况、系统的负载情况等。
您好 我们是单节点单机运行,不存在网络异常问题,问题期间内存占用稳定(50%)左右,这个问题在2.6就出现了,新版本出现概率小很多但仍然有