部署pump后异常修复

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:3.0.4
  • 【问题描述】:3tidb 3tikv公用ssd的机器,一台8核32G,500G硬盘机器部署pump,部署过程没有异常,滚动启动tidb-server后,tikv cpu使用陡然增加,内存有一台机器的陡降,中间有内存不够的错误
    image
    image
    现在想去掉pump组件,offline-pump or pause-pump 都报错:
    [2019/11/25 18:14:45.068 +08:00] [FATAL] [main.go:68] [“fail to execute command”] [command=update-pump] [error=“key /tidb-binlog/v1/pumps/10.9.48.230:8250 in etcd not found”] [errorVerbose=“key /tidb-binlog/v1/pumps/10.9.48.230:8250 in etcd not found
    github.com/pingcap/errors.NotFoundf
    /home/jenkins/workspace/release_tidb_3.0/go/pkg/mod/github.com/pingcap/errors@v0.11.4/juju_adaptor.go:117
    github.com/pingcap/tidb-binlog/pkg/etcd.(*Client).Get
    /home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb-binlog/pkg/etcd/etcd.go:100
    github.com/pingcap/tidb-binlog/pkg/node.(*EtcdRegistry).Node
    /home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb-binlog/pkg/node/registry.go:58
    github.com/pingcap/tidb-binlog/binlogctl.UpdateNodeState
    /home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb-binlog/binlogctl/nodes.go:66
    main.main
    /home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb-binlog/cmd/binlogctl/main.go:52
    runtime.main
    /usr/local/go/src/runtime/proc.go:200
    runtime.goexit
    /usr/local/go/src/runtime/asm_amd64.s:1337”] [stack=“github.com/pingcap/log.Fatal
    /home/jenkins/workspace/release_tidb_3.0/go/pkg/mod/github.com/pingcap/log@v0.0.0-20190307075452-bd41d9273596/global.go:59
    main.main
    /home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb-binlog/cmd/binlogctl/main.go:68
    runtime.main
    /usr/local/go/src/runtime/proc.go:200”]

想问下服务异常的定位方向,以及去掉pump组件的方式。

补充一下,过了一个小时后,重新切写流量正常,感觉像是滚动启动tidb的过程中的问题

停掉 Pump 的步骤:

1)关闭 tidb-server binlog

2)停掉 pump。

3)停掉 drainer

2 和 3 的顺序取决于业务诉求。