TDengine 流计算在运行一段时间以后会暂停写入

【TDengine 使用环境】
生产环境 /测试/ Poc/预生产环境

【TDengine 版本】 3.3.7~

【操作系统以及版本】unbantu

【部署方式】容器

【集群节点数】单节点

【集群副本数】0

【描述业务影响】

需要通过流计算,对源数据进行分类,分时间段存储。目前发现的问题是:创建流时可以正常运行一段时间(1-3天),数据在此时间内正常写入。超出这个时间段流还是running状态,但是数据不写入,日志没有发现报错。

可以给一份排查方案,和流计算的日志配置

create stream if not exists stream_dwd_agg_m interval(1m) sliding (1m) from zhsc_dev.dwd_measure_point partition by device_id, point_id stream_options ( watermark (1m) | fill_history (1) ) into zhsc_dev.dwd_agg_m tags ( device_id nchar (64) as device_id, point_id nchar (128) as point_id ) as select _wstart as ts, _wend as wend, first (v) as first_v, last (v) as last_v, last (v) - first (v) as diff_v, avg(v) as avg_v, sum(v) as sum_v, max(v) as max_v, min(v) as min_v, count(*) as row_count from %%trows partition by device_id, point_id interval(1m) sliding (1m);

【资源配置】分配4G内存

  1. 排查方法:设置 stDebugFlag 为 135,一段时间后从日志文件中过滤 stream_id 相关的内容,通过 Trigger 和 Runner 的日志判断是没有触发还是没有计算。其中 stream_id 的值可以从 information_schema.ins_streams 中查到。
  2. 这个流看起来可能有性能问题,这里提供一种性能更好一些的写法:
create stream if not exists stream_dwd_agg_m
    interval(1m) sliding (1m)
from
    zhsc_dev.dwd_measure_point partition by device_id, point_id
    stream_options ( watermark (1m) | fill_history (1) )
into
    zhsc_dev.dwd_agg_m tags ( device_id nchar (64) as device_id, point_id nchar (128) as point_id )
as
select
    _twstart as ts,
    _twend as wend,
    first (v) as first_v,
    last (v) as last_v,
    last (v) - first (v) as diff_v,
    avg(v) as avg_v,
    sum(v) as sum_v,
    max(v) as max_v,
    min(v) as min_v,
    count(*) as row_count
from
    zhsc_dev.dwd_measure_point
where
    device_id = %%1 and point_id = %%2 and _c0 >= _twstart and _c0 < _twend;