TICDC 同步任务 异常

【 TiDB 使用环境】
通过 TiCDC 订阅 TiDB 数据的变更到 下游 MySQL
部署了一个 TiCDC(5.1.0 )的节点 从上游 TiDB(5.0.0 rc) 同步数据 到下游的MySQL 5.7 版本中

【现象】 业务和数据库现象
TiCDC 不断的刷大量的日志 而且同步没有任何的进展
[2021/12/23 11:38:25.046 +08:00] [WARN] [region_worker.go:307] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=98536] [span=“[7480000000000002ff4b5f728000000027ff609a530000000000fa, 7480000000000002ff4b5f728000000027ffd9d4360000000000fa)”] [duration=43h18m16.45s] [lastEvent=31m25.252325637s] [resolvedTs=429935770469924931]
[2021/12/23 11:38:25.046 +08:00] [WARN] [region_worker.go:307] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=144444] [span=“[7480000000000005ffa35f728000000000ff00e0fc0000000000fa, 7480000000000005ffa35f728000000000ff00ec980000000000fa)”] [duration=43h18m16.45s] [lastEvent=31m26.366818195s] [resolvedTs=429935770469924931]
[2021/12/23 11:38:25.046 +08:00] [WARN] [region_worker.go:307] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=159064] [span=“[7480000000000006ff0f5f728000000000ff0239340000000000fa, 7480000000000006ff0f5f728000000000ff0246510000000000fa)”] [duration=43h18m16.45s] [lastEvent=31m25.652929911s] [resolvedTs=429935770469924931]
[2021/12/23 11:38:25.046 +08:00] [INFO] [lock_resolver.go:123] [“resolve lock successfully”] [regionID=89656] [maxVersion=429976635167473664]
[2021/12/23 11:38:25.046 +08:00] [INFO] [lock_resolver.go:123] [“resolve lock successfully”] [regionID=93704] [maxVersion=429976635167473664]
[2021/12/23 11:38:25.046 +08:00] [WARN] [region_worker.go:307] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=89628] [span=“[7480000000000001ff6d5f72800000000dff709b530000000000fa, 7480000000000001ff6d5f72800000000dffe9a3da0000000000fa)”] [duration=43h18m16.45s] [lastEvent=31m25.254448819s] [resolvedTs=429935770469924931]
[2021/12/23 11:38:25.046 +08:00] [WARN] [region_worker.go:307] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=94012] [span=“[7480000000000001ff6f5f728000000014ff60686d0000000000fa, 7480000000000001ff6f5f728000000014ff9d214b0000000000fa)”] [duration=43h18m16.45s] [lastEvent=31m25.659106981s] [resolvedTs=429935770469924931]
[2021/12/23 11:38:25.046 +08:00] [INFO] [lock_resolver.go:123] [“resolve lock successfully”] [regionID=74172] [maxVersion=429976635167473664]
[2021/12/23 11:38:25.046 +08:00] [WARN] [region_worker.go:307] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=74412] [span=“[7480000000000001ffa75f728000000011ff43086f0000000000fa, 7480000000000001ffa75f728000000011ffbc057e0000000000fa)”] [duration=43h18m16.45s] [lastEvent=31m25.853513234s] [resolvedTs=429935770469924931]
[2021/12/23 11:38:25.046 +08:00] [INFO] [lock_resolver.go:123] [“resolve lock successfully”] [regionID=193244] [maxVersion=429976635167473664]
[2021/12/23 11:38:25.046 +08:00] [INFO] [lock_resolver.go:123] [“resolve lock successfully”] [regionID=191996] [maxVersion=429976635167473664]
[2021/12/23 11:38:25.046 +08:00] [WARN] [region_worker.go:307] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=133692] [span=“[7480000000000004ff795f728000000000ff0058b70000000000fa, 7480000000000004ff795f728000000000ff0064520000000000fa)”] [duration=43h18m16.45s] [lastEvent=31m24.546299747s] [resolvedTs=429935770469924931]

【业务影响】 无 在进行测试

【TiDB 版本】
5.0.0 rc

【附件】
ticdc_1_log.zip (16.9 MB)

cdc cli changefeed list --pd=“http://127.0.0.1:2379

cdc cli changefeed query -c task_name --pd=“http://127.0.0.1:2379

查看一下同步任务的状态

CDC的任务状态
| ./cdc cli changefeed list --pd=http://9.44.5.66:15072[ {
“id”: “sync-task-1”,
“summary”: {
“state”: “normal”,
“tso”: 429935770469924931,
“checkpoint”: “2021-12-21 16:20:08.006”,
“error”: null
}
}
]

使用query查询的话结果很长 很长都是这样的

      "79": {
        "start-ts": 429935770469924931,
        "mark-table-id": 0
      },
      "81": {
        "start-ts": 429935770469924931,
        "mark-table-id": 0
      },
      "83": {
        "start-ts": 429935770469924931,
        "mark-table-id": 0
      },
      "85": {
        "start-ts": 429935770469924931,
        "mark-table-id": 0
      },
      "87": {
        "start-ts": 429935770469924931,
        "mark-table-id": 0
      },
      "89": {
        "start-ts": 429935770469924931,
        "mark-table-id": 0
      },
      "91": {
        "start-ts": 429935770469924931,
        "mark-table-id": 0
      },
      "93": {
        "start-ts": 429935770469924931,
        "mark-table-id": 0
      },
      "95": {
        "start-ts": 429935770469924931,
        "mark-table-id": 0
      },
      "99": {
        "start-ts": 429935770469924931,
        "mark-table-id": 0
      }
    },
    "operation": {},
    "admin-job-type": 0

任务已经停了啊,“checkpoint”: “2021-12-21 16:20:08.006”

仔细瞅瞅有没有错误日志。 停了这么长时间数据都GC掉了吧。 。

查看GC配置:
SELECT *FROM mysql.tidb WHERE VARIABLE_NAME LIKE ‘%gc%’;

我是cdc的进程 kill了 但是没有发生 GC 在这个之前 gc的时间都被调整到最大了

cdc 节点 CPU跑的很高 都2000多了

把完整的query信息贴一下,去掉账号和密码

稍等一下 我重启下任务

query.json (90.6 KB)

你好 可以看下这个query的信息 刚才开会去了

如果需要监控信息的话 我也可以导出来

worker-count=1
max-txn-row=1

:rofl: 怎么这2个参数怎么都调成1了。1个并发往下游写。很慢的。

重建一下同步任务调大一些吧。。。

好的 我试下 默认是 16 的是吧

还是没有什么效果 CDC 还在大量的刷日志
[2021/12/23 19:52:51.624 +08:00] [DEBUG] [kv.go:452] [“update resolveTS failed”] [error="rpc error: code = Unimplemented desc = "] [errorVerbose=“rpc error: code = Unimplemented desc = \ngithub.com/pingcap/errors.AddStack\ \tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174\ github.com/pingcap/errors.Trace\ \tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15\ github.com/pingcap/tidb/store/tikv/tikvrpc.CallRPC\ \tgithub.com/pingcap/tidb@v1.1.0-beta.0.20210508083641-8ed1d9d4a798/store/tikv/tikvrpc/tikvrpc.go:931\ github.com/pingcap/tidb/store/tikv.(*RPCClient).SendRequest\ \tgithub.com/pingcap/tidb@v1.1.0-beta.0.20210508083641-8ed1d9d4a798/store/tikv/client.go:409\ github.com/pingcap/tidb/store/tikv.reqCollapse.SendRequest\ \tgithub.com/pingcap/tidb@v1.1.0-beta.0.20210508083641-8ed1d9d4a798/store/tikv/client_collapse.go:49\ github.com/pingcap/tidb/store/tikv.(*KVStore).updateResolveTS.func1\ \tgithub.com/pingcap/tidb@v1.1.0-beta.0.20210508083641-8ed1d9d4a798/store/tikv/kv.go:447\ runtime.goexit\ \truntime/asm_amd64.s:1371”] [store-id=269009]

看下同步速度有没有加快,把日志模式调一下吧。 DEBUG模式和INFO模式有很多日志的。
调整cdc日志级别
curl -X POST -d ‘“debug”’ http://172.16.11.12:8300/admin/log

https://docs.pingcap.com/zh/tidb/stable/ticdc-open-api/#动态调整-ticdc-server-日志级别

好的 我试下
地址是PD的地址还是 cdc的地址
端口统一都是 8300吗

CDC组件的地址,默认是8300

nohup ./cdc server --pd=http://9.44.5.66:$1 --log-file=ticdc_1.log --log-level=debug --sorter-max-memory-percentage=15 --sorter-max-memory-consumption=4294967296 --sorter-num-concurrent-worker=2 --data-dir=/data1/ticdc$1/cdctmp --addr=11.185.162.13:$port --advertise-addr=11.185.162.13:$port & > start.log

这样是不是在部署cdc的时候已经默认是debug了

是的,可以调成error吧。

需要调成error 是吧 我改下的

目前没有 error的日志出现