查看TiCDC capture列表报错

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:v4.0.5
  • 【问题描述】:
    启动cdc server后,查看capture列表:报错
    cdc cli capture list --pd=http://10.177.97.149:2379

报错信息如下:
[WARN] [base_client.go:180] ["[pd] failed to get cluster id"] [url=http://10.177.97.149:2379] [error=“context canceled”] [errorVerbose=“context canceled\ngithub.com/pingcap/pd/v4/pkg/grpcutil.GetClientConn\ \tgithub.com/pingcap/pd/v4@v4.0.5-0.20200817114353-e465cafe8a91/pkg/grpcutil/grpcutil.go:100\ github.com/pingcap/pd/v4/client.(*baseClient).getOrCreateGRPCConn\ \tgithub.com/pingcap/pd/v4@v4.0.5-0.20200817114353-e465cafe8a91/client/base_client.go:280\ github.com/pingcap/pd/v4/client.(*baseClient).getMembers\ \tgithub.com/pingcap/pd/v4@v4.0.5-0.20200817114353-e465cafe8a91/client/base_client.go:212\ github.com/pingcap/pd/v4/client.(*baseClient).initClusterID\ \tgithub.com/pingcap/pd/v4@v4.0.5-0.20200817114353-e465cafe8a91/client/base_client.go:177\ github.com/pingcap/pd/v4/client.(*baseClient).initRetry\ \tgithub.com/pingcap/pd/v4@v4.0.5-0.20200817114353-e465cafe8a91/client/base_client.go:113\ github.com/pingcap/pd/v4/client.newBaseClient\ \tgithub.com/pingcap/pd/v4@v4.0.5-0.20200817114353-e465cafe8a91/client/base_client.go:94\ github.com/pingcap/pd/v4/client.NewClientWithContext\ \tgithub.com/pingcap/pd/v4@v4.0.5-0.20200817114353-e465cafe8a91/client/client.go:148\ github.com/pingcap/ticdc/cmd.newCliCommand.func1\ \tgithub.com/pingcap/ticdc@/cmd/client.go:168\ github.com/spf13/cobra.(*Command).execute\ \tgithub.com/spf13/cobra@v1.0.0/command.go:821\ github.com/spf13/cobra.(*Command).ExecuteC\ \tgithub.com/spf13/cobra@v1.0.0/command.go:950\ github.com/spf13/cobra.(*Command).Execute\ \tgithub.com/spf13/cobra@v1.0.0/command.go:887\ github.com/pingcap/ticdc/cmd.Execute\ \tgithub.com/pingcap/ticdc@/cmd/root.go:32\ main.main\ \tgithub.com/pingcap/ticdc@/main.go:22\ runtime.main\ \truntime/proc.go:203\ runtime.goexit\ \truntime/asm_amd64.s:1357\ github.com/pingcap/pd/v4/client.(*baseClient).getOrCreateGRPCConn\ \tgithub.com/pingcap/pd/v4@v4.0.5-0.20200817114353-e465cafe8a91/client/base_client.go:282\ github.com/pingcap/pd/v4/client.(*baseClient).getMembers\ \tgithub.com/pingcap/pd/v4@v4.0.5-0.20200817114353-e465cafe8a91/client/base_client.go:212\ github.com/pingcap/pd/v4/client.(*baseClient).initClusterID\ \tgithub.com/pingcap/pd/v4@v4.0.5-0.20200817114353-e465cafe8a91/client/base_client.go:177\ github.com/pingcap/pd/v4/client.(*baseClient).initRetry\ \tgithub.com/pingcap/pd/v4@v4.0.5-0.20200817114353-e465cafe8a91/client/base_client.go:113\ github.com/pingcap/pd/v4/client.newBaseClient\ \tgithub.com/pingcap/pd/v4@v4.0.5-0.20200817114353-e465cafe8a91/client/base_client.go:94\ github.com/pingcap/pd/v4/client.NewClientWithContext\ \tgithub.com/pingcap/pd/v4@v4.0.5-0.20200817114353-e465cafe8a91/client/client.go:148\ github.com/pingcap/ticdc/cmd.newCliCommand.func1\ \tgithub.com/pingcap/ticdc@/cmd/client.go:168\ github.com/spf13/cobra.(*Command).execute\ \tgithub.com/spf13/cobra@v1.0.0/command.go:821\ github.com/spf13/cobra.(*Command).ExecuteC\ \tgithub.com/spf13/cobra@v1.0.0/command.go:950\ github.com/spf13/cobra.(*Command).Execute\ \tgithub.com/spf13/cobra@v1.0.0/command.go:887\ github.com/pingcap/ticdc/cmd.Execute\ \tgithub.com/pingcap/ticdc@/cmd/root.go:32\ main.main\ \tgithub.com/pingcap/ticdc@/main.go:22\ runtime.main\ \truntime/proc.go:203\ runtime.goexit\ \truntime/asm_amd64.s:1357”]
Error: fail to open PD client: [pd] failed to get cluster id

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

检查 pd 节点服务是否正常,可以通过 pd-ctl member 看下

pd-ctl member 显示如下信息:

{
“header”: {
“cluster_id”: 6852855214720459155
},
“members”: [
{
“name”: “pd_tikv3”,
“member_id”: 6765253526090305133,
“peer_urls”: [
http://10.177.97.150:2380
],
“client_urls”: [
http://10.177.97.150:2379
],
“deploy_path”: “/data/deploy/bin”,
“binary_version”: “v4.0.5”,
“git_hash”: “57f6db193f70d539112f713ef3948c8232fa2507”
},
{
“name”: “pd_tikv1”,
“member_id”: 12680333574309607274,
“peer_urls”: [
http://10.177.97.148:2380
],
“client_urls”: [
http://10.177.97.148:2379
],
“deploy_path”: “/data/deploy/bin”,
“binary_version”: “v4.0.5”,
“git_hash”: “57f6db193f70d539112f713ef3948c8232fa2507”
},
{
“name”: “pd_tikv2”,
“member_id”: 16890238619233473388,
“peer_urls”: [
http://10.177.97.149:2380
],
“client_urls”: [
http://10.177.97.149:2379
],
“deploy_path”: “/data/deploy/bin”,
“binary_version”: “v4.0.5”,
“git_hash”: “57f6db193f70d539112f713ef3948c8232fa2507”
}
],
“leader”: {
“name”: “pd_tikv3”,
“member_id”: 6765253526090305133,
“peer_urls”: [
http://10.177.97.150:2380
],
“client_urls”: [
http://10.177.97.150:2379
]
},
“etcd_leader”: {
“name”: “pd_tikv3”,
“member_id”: 6765253526090305133,
“peer_urls”: [
http://10.177.97.150:2380
],
“client_urls”: [
http://10.177.97.150:2379
],
“deploy_path”: “/data/deploy/bin”,
“binary_version”: “v4.0.5”,
“git_hash”: “57f6db193f70d539112f713ef3948c8232fa2507”
}
}

  1. 感谢反馈 pd-ctl health 看下返回结果呢
  2. 辛苦重复执行 cdc 命令看下是否稳定复现这个问题

神奇。。。health看都ok,然后我再执行查看capture列表,就可以了。。。我啥都没动。。

先观察吧,可以留一下 pd 和 cdc 所在服务器的网络环境。

好的,谢谢~

:point_left:

这个报错,我重启tidb集群后,就会出现。

没有必然联系,当前集群是否为万兆网卡?

是的,如下
Supported ports: [ TP ]
Supported link modes: 1000baseT/Full
10000baseT/Full
Supported pause frame use: No
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Speed: 10000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: off
MDI-X: Unknown

请问下,是否重启集群,执行 cdc命令稳定复现吗?

cdc cli capture list --pd=http://10.177.97.149:2379
很多时候都是报上面的错。

我想问:这个错就是建立不了到pd的链接吗?

您好,请问这个问题是以很高频率出现,但不是必现,对吗?这个问题是否只出现在重启TiDB之后? 重启TiDB之后是否总会出现这个问题?
另外能否上传一下PD的日志?
还有能否确定CDC和PD之间的网络是否稳定,而且该PD节点是否只有10.177.97.149:2379一个地址?CDC和PD之间是否有任何网络地址转换(NAT)?

@BinLi1988 hello, 楼上信息是否可以回复下

好,之前是在测试环境测的;这两天我会在新的环境安装和使用,如果还有这个问题,我会告诉你。

:+1:

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。