从日志和 goroutine 里面看,两次执行确实都有部分文件在执行的过程中卡住了,具体卡在了 mysql client 创建连接的地方,具体的 routine 如下:
goroutine 4051 [IO wait, 19 minutes]:
internal/poll.runtime_pollWait(0x7fdc79fdfa30, 0x72, 0xffffffffffffffff)
runtime/netpoll.go:184 +0x55
internal/poll.(*pollDesc).wait(0xc11bb43a98, 0x72, 0x1000, 0x1000, 0xffffffffffffffff)
internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc11bb43a80, 0xc16a821000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
internal/poll/fd_unix.go:169 +0x1cf
net.(*netFD).Read(0xc11bb43a80, 0xc16a821000, 0x1000, 0x1000, 0xc043f52b00, 0x1415ad2, 0xc11bb43a80)
net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc03cc1a468, 0xc16a821000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
net/net.go:184 +0x68
github.com/go-sql-driver/mysql.(*buffer).fill(0xc127eda120, 0x4, 0x300000002, 0xc001020c00)
github.com/go-sql-driver/mysql@v1.5.0/buffer.go:90 +0x13c
github.com/go-sql-driver/mysql.(*buffer).readNext(0xc127eda120, 0x4, 0xc001020c00, 0xc000069400, 0x203133, 0x203133, 0x203133)
github.com/go-sql-driver/mysql@v1.5.0/buffer.go:119 +0x9c
github.com/go-sql-driver/mysql.(*mysqlConn).readPacket(0xc127eda120, 0xc4cfea2658, 0xc766a22f20, 0xc043f52d08, 0x1344fd6, 0xc000480700)
github.com/go-sql-driver/mysql@v1.5.0/packets.go:31 +0x91
github.com/go-sql-driver/mysql.(*mysqlConn).readHandshakePacket(0xc127eda120, 0x1000, 0x1000, 0xc16a821000, 0x0, 0xc000a002ce, 0x13, 0x4083bc0)
github.com/go-sql-driver/mysql@v1.5.0/packets.go:187 +0x40
github.com/go-sql-driver/mysql.(*connector).Connect(0xc000010c18, 0x4056720, 0xc0000efc40, 0x0, 0x0, 0x0, 0x0)
github.com/go-sql-driver/mysql@v1.5.0/connector.go:81 +0x45f
database/sql.(*DB).conn(0xc000b4c9c0, 0x4056720, 0xc0000efc40, 0xc043f53201, 0x1345131, 0xc000480700, 0xc50051a000)
database/sql/sql.go:1228 +0x201
database/sql.(*DB).exec(0xc000b4c9c0, 0x4056720, 0xc0000efc40, 0xc50051a000, 0x10019c, 0x0, 0x0, 0x0, 0xc5000f8001, 0xebe27, ...)
database/sql/sql.go:1495 +0x66
database/sql.(*DB).ExecContext(0xc000b4c9c0, 0x4056720, 0xc0000efc40, 0xc50051a000, 0x10019c, 0x0, 0x0, 0x0, 0x4020c20, 0xc16a418680, ...)
database/sql/sql.go:1477 +0xde
github.com/pingcap/br/pkg/lightning/backend/tidb.(*tidbBackend).WriteRowsToDB(0xc0002f3800, 0x4056720, 0xc0000efc40, 0xc000b57440, 0x29, 0x0, 0x0, 0x0, 0x4021220, 0xc090b2a4a0, ...)
github.com/pingcap/br@/pkg/lightning/backend/tidb/tidb.go:421 +0x577
github.com/pingcap/br/pkg/lightning/backend/tidb.(*tidbBackend).WriteRows(0xc0002f3800, 0x4056720, 0xc0000efc40, 0xc95db5245ddc45a0, 0x9612377a5a82139b, 0xc000b57440, 0x29, 0x0, 0x0, 0x0, ...)
github.com/pingcap/br@/pkg/lightning/backend/tidb/tidb.go:366 +0x13d
github.com/pingcap/br/pkg/lightning/backend/tidb.(*Writer).AppendRows(0xc220e7ecc0, 0x4056720, 0xc0000efc40, 0xc000b57440, 0x29, 0x0, 0x0, 0x0, 0x0, 0x4021220, ...)
github.com/pingcap/br@/pkg/lightning/backend/tidb/tidb.go:599 +0xd8
两次执行卡住的文件数不同,然后对应的文件也不同,并且每次重启就能回复,推测可能是 tidb 在处理创建连接请求的时候逻辑有问题导致会卡住,可能需要看一下在卡住的时候,对应的 tidb 的堆栈里面,处理 mysql 连接请求部分的逻辑是否正常