tidb-server的CPU居高不下的问题

10月29号晚上,开发的同学,在TIDB的线上运行了几次有关于sum的操作,导致CPU直线飙升。首先需要承认的是,他们的这种做法欠妥当。在第一次没运行完SQL的前提下,又执行了几次。
后来接到了zabbix告警后(CPU超出设定的阈值),用kill tidb XXX进程 的方式将SQL的进程给KILL掉。但是一直到现在,已经过去很久,但是CPU利用率真还是一直没有下降的趋势。如图:

请问大佬,这个现象该如何排查以及解决。在此谢谢大佬。

我的TIDB-server服务器的 CPU:24核,内存64GB。TIDB的版本是2.1.8
请大佬们能够指教。谢谢。

麻烦执行下:go tool pprof http://{TiDBIP}:10080/debug/pprof/profile?seconds=30s 拿下信息


请看信息

这个需要给下生成的文件信息,看下具体哪块在消耗 CPU 资源。

你好,请问需要提供哪方面的文件信息以及获取这些文件信息的方式。其实是这样的。原本资源使用率是不高的,但是昨天晚上跑了几条sum统计的sql,一下子就飙升啦,后来kill掉正在执行的这些慢sql后,业务没有什么慢语句在执行,原本以cpu会降下来,但是却一直居高不下。所以我是非常的奇怪。按道理说,结束啦执行慢的语句后,cpu肯定会降下来的,至少它也会慢慢降下来,但是如您所看的那张监控图,丝毫没有降下来的意思。

Duration: 30.17s, Total samples = 363.69s (1205.40%) Showing nodes accounting for 310.57s, 85.39% of 363.69s total Dropped 1464 nodes (cum <= 1.82s) flat flat% sum% cum cum% 59.34s 16.32% 16.32% 119.15s 32.76% github.com/pingcap/tidb/util/chunk.(*Chunk).AppendPartialRow 26.78s 7.36% 23.68% 26.78s 7.36% runtime.memmove 25.56s 7.03% 30.71% 25.56s 7.03% github.com/pingcap/tidb/util/chunk.(*column).appendNullBitmap (inline) 15.12s 4.16% 34.86% 50.10s 13.78% runtime.mallocgc 14.93s 4.11% 38.97% 115.07s 31.64% github.com/pingcap/tidb/executor.(*HashAggPartialWorker).updatePartialResult 13.02s 3.58% 42.55% 13.02s 3.58% github.com/pingcap/tidb/util/chunk.(*column).isNull (inline) 9.33s 2.57% 45.12% 11.75s 3.23% runtime.lock 8.71s 2.39% 47.51% 10.06s 2.77% runtime.findObject 8.01s 2.20% 49.71% 12.80s 3.52% runtime.mapaccess2_faststr 7.50s 2.06% 51.77% 7.50s 2.06% github.com/pingcap/tidb/executor/aggfuncs.(*firstRow4String).UpdatePartialResult 7.28s 2.00% 53.78% 19.26s 5.30% runtime.scanobject 5.88s 1.62% 55.39% 5.88s 1.62% runtime.nextFreeFast (inline) 5.74s 1.58% 56.97% 6.96s 1.91% runtime.unlock 5.19s 1.43% 58.40% 10.05s 2.76% github.com/pingcap/tidb/util/chunk.Row.GetDatum 4.94s 1.36% 59.76% 6.83s 1.88% runtime.heapBitsSetType 4.27s 1.17% 60.93% 28.44s 7.82% github.com/pingcap/tidb/executor.(*HashAggPartialWorker).getGroupKey 4.01s 1.10% 62.03% 9.96s 2.74% github.com/pingcap/tidb/executor/aggfuncs.(*sum4Decimal).UpdatePartialResult 3.83s 1.05% 63.09% 3.83s 1.05% runtime.memclrNoHeapPointers 3.79s 1.04% 64.13% 13.64s 3.75% runtime.selectgo 3.72s 1.02% 65.15% 3.90s 1.07% syscall.Syscall 3.60s 0.99% 66.14% 3.60s 0.99% memeqbody 3.51s 0.97% 67.11% 44.92s 12.35% runtime.newobject 3.20s 0.88% 67.99% 69.37s 19.07% github.com/pingcap/tidb/executor.(*innerJoiner).tryToMatch 3.09s 0.85% 68.84% 3.09s 0.85% runtime.epollwait 2.77s 0.76% 69.60% 2.77s 0.76% cmpbody 2.67s 0.73% 70.33% 2.67s 0.73% runtime.futex 2.59s 0.71% 71.04% 2.59s 0.71% runtime.markBits.isMarked (inline) 2.58s 0.71% 71.75% 38.52s 10.59% github.com/pingcap/tidb/util/chunk.(*Chunk).AppendRow 2.51s 0.69% 72.44% 2.51s 0.69% github.com/pingcap/tidb/util/chunk.(*column).reset (inline) 2.14s 0.59% 73.03% 2.16s 0.59% runtime.(*waitq).dequeue 2.10s 0.58% 73.61% 2.10s 0.58% runtime.casgstatus 2.06s 0.57% 74.18% 12.04s 3.31% github.com/pingcap/tidb/expression.(*Column).Eval 1.97s 0.54% 74.72% 4.46s 1.23% runtime.greyobject 1.91s 0.53% 75.24% 63.94s 17.58% github.com/pingcap/tidb/executor.(*baseJoiner).makeJoinRowToChunk 1.87s 0.51% 75.76% 8.51s 2.34% github.com/pingcap/tidb/util/codec.encodeBytes 1.67s 0.46% 76.22% 6.44s 1.77% runtime.sweepone 1.65s 0.45% 76.67% 2.40s 0.66% github.com/pingcap/tidb/types.doAdd 1.51s 0.42% 77.08% 10.48s 2.88% github.com/pingcap/tidb/util/codec.encode 1.50s 0.41% 77.50% 157.50s 43.31% github.com/pingcap/tidb/executor.(*HashJoinExec).join2Chunk 1.48s 0.41% 77.90% 6.53s 1.80% runtime.gcWriteBarrier 1.45s 0.4% 78.30% 1.87s 0.51% runtime.runqgrab 1.31s 0.36% 78.66% 1.92s 0.53% runtime.heapBitsForAddr (inline) 1.15s 0.32% 78.98% 5.87s 1.61% runtime.wbBufFlush1 1.06s 0.29% 79.27% 2.93s 0.81% github.com/pingcap/tidb/util/codec.EncodeVarint 1.03s 0.28% 79.55% 5.08s 1.40% github.com/pingcap/tidb/expression.(*Column).EvalString 1.02s 0.28% 79.83% 6.86s 1.89% runtime.findrunnable 0.98s 0.27% 80.10% 10.99s 3.02% github.com/pingcap/tidb/expression.VectorizedFilter 0.88s 0.24% 80.35% 13.57s 3.73% github.com/pingcap/tidb/executor.baseHashAggWorker.getPartialResult 0.82s 0.23% 80.57% 8.88s 2.44% github.com/pingcap/tidb/expression.(*builtinEQStringSig).evalInt 0.82s 0.23% 80.80% 6.64s 1.83% github.com/pingcap/tidb/util/codec.EncodeCompactBytes 0.80s 0.22% 81.02% 4.86s 1.34% runtime.sellock 0.78s 0.21% 81.23% 1.92s 0.53% github.com/pingcap/tidb/util/mvmap.(*MVMap).Get 0.75s 0.21% 81.44% 7.44s 2.05% github.com/pingcap/tidb/executor.(*HashAggExec).fetchChildData 0.75s 0.21% 81.64% 7.09s 1.95% github.com/pingcap/tidb/util/chunk.Row.IsNull (inline) 0.68s 0.19% 81.83% 60.49s 16.63% github.com/pingcap/tidb/executor.(*leftOuterJoiner).onMissMatch 0.61s 0.17% 82.00% 2.97s 0.82% runtime.(*mcentral).freeSpan 0.60s 0.16% 82.16% 3.11s 0.86% github.com/pingcap/tidb/util/chunk.(*Chunk).Reset 0.58s 0.16% 82.32% 7.94s 2.18% github.com/pingcap/tidb/expression.CompareString 0.56s 0.15% 82.48% 1.86s 0.51% runtime.ready 0.54s 0.15% 82.63% 4.89s 1.34% runtime.(*mspan).sweep 0.53s 0.15% 82.77% 79.43s 21.84% github.com/pingcap/tidb/executor.(*HashJoinExec).joinMatchedOuterRow2Chunk 0.51s 0.14% 82.91% 2.28s 0.63% runtime.chanrecv 0.48s 0.13% 83.04% 2.03s 0.56% github.com/pingcap/parser.yyParse 0.47s 0.13% 83.17% 7.65s 2.10% github.com/pingcap/tidb/executor.(*HashJoinExec).Next 0.46s 0.13% 83.30% 8.86s 2.44% runtime.schedule 0.44s 0.12% 83.42% 6.99s 1.92% github.com/pingcap/tidb/executor.(*HashJoinExec).getNewJoinResult 0.44s 0.12% 83.54% 4.07s 1.12% github.com/pingcap/tidb/types.DecimalAdd 0.44s 0.12% 83.66% 11.83s 3.25% runtime.mcall 0.42s 0.12% 83.78% 8.78s 2.41% runtime.(*mcentral).cacheSpan 0.40s 0.11% 83.89% 10.81s 2.97% github.com/pingcap/tidb/util/codec.EncodeValue 0.37s 0.1% 83.99% 9.26s 2.55% github.com/pingcap/tidb/expression.(*ScalarFunction).EvalInt 0.35s 0.096% 84.09% 3.95s 1.09% github.com/pingcap/tidb/executor.(*HashAggPartialWorker).getChildInput 0.34s 0.093% 84.18% 6.68s 1.84% github.com/pingcap/tidb/executor.(*HashJoinExec).fetchOuterChunks 0.31s 0.085% 84.26% 163.02s 44.82% github.com/pingcap/tidb/executor.(*HashJoinExec).runJoinWorker 0.28s 0.077% 84.34% 10.59s 2.91% runtime.chansend 0.25s 0.069% 84.41% 2.29s 0.63% github.com/pingcap/tidb/types.CompareString (inline) 0.25s 0.069% 84.48% 2.08s 0.57% runtime.selunlock 0.24s 0.066% 84.54% 4.85s 1.33% runtime.(*mheap).alloc_m 0.22s 0.06% 84.61% 3.37s 0.93% runtime.netpoll 0.21s 0.058% 84.66% 119.23s 32.78% github.com/pingcap/tidb/executor.(*HashAggPartialWorker).run 0.18s 0.049% 84.71% 15.16s 4.17% runtime.gcDrain 0.16s 0.044% 84.76% 1.88s 0.52% github.com/pingcap/tidb/expression.(*Column).EvalDecimal 0.16s 0.044% 84.80% 2.03s 0.56% runtime.runqsteal 0.16s 0.044% 84.84% 1.99s 0.55% runtime.typedmemmove 0.15s 0.041% 84.89% 9.33s 2.57% runtime.(*mcache).nextFree 0.12s 0.033% 84.92% 3.83s 1.05% google.golang.org/grpc/transport.(*loopyWriter).run 0.11s 0.03% 84.95% 3.11s 0.86% golang.org/x/net/http2.(*Framer).ReadFrame 0.11s 0.03% 84.98% 7.10s 1.95% runtime.makeslice 0.11s 0.03% 85.01% 10.39s 2.86% runtime.park_m 0.11s 0.03% 85.04% 42.25s 11.62% runtime.systemstack 0.10s 0.027% 85.07% 4.58s 1.26% google.golang.org/grpc/transport.(*http2Client).reader 0.10s 0.027% 85.09% 1.96s 0.54% runtime.goready.func1 0.09s 0.025% 85.12% 2.27s 0.62% runtime.chanrecv2 0.08s 0.022% 85.14% 2.25s 0.62% io.ReadAtLeast 0.08s 0.022% 85.16% 3.70s 1.02% runtime.send 0.07s 0.019% 85.18% 1.98s 0.54% bufio.(*Reader).Read 0.07s 0.019% 85.20% 10.30s 2.83% runtime.chansend1 0.06s 0.016% 85.22% 8.87s 2.44% runtime.(*mcache).nextFree.func1 0.05s 0.014% 85.23% 1.82s 0.5% net.(*conn).Read 0.05s 0.014% 85.25% 5.01s 1.38% runtime.(*mcentral).grow 0.05s 0.014% 85.26% 4.58s 1.26% runtime.gcDrainN 0.03s 0.0082% 85.27% 17.19s 4.73% github.com/pingcap/tidb/server.(*TiDBContext).Execute 0.03s 0.0082% 85.28% 17.16s 4.72% github.com/pingcap/tidb/session.(*session).execute 0.03s 0.0082% 85.28% 3.78s 1.04% github.com/pingcap/tidb/store/tikv/tikvrpc.CallRPC 0.03s 0.0082% 85.29% 8.81s 2.42% runtime.(*mcache).refill 0.03s 0.0082% 85.30% 6.67s 1.83% runtime.(*mheap).alloc 0.03s 0.0082% 85.31% 1.99s 0.55% runtime.goready 0.02s 0.0055% 85.31% 18.94s 5.21% github.com/pingcap/tidb/server.(*clientConn).Run 0.02s 0.0055% 85.32% 2.17s 0.6% github.com/pingcap/tidb/session.(*session).ParseSQL 0.02s 0.0055% 85.33% 4.74s 1.30% github.com/pingcap/tidb/store/tikv.(*RegionRequestSender).SendReq 0.02s 0.0055% 85.33% 1.99s 0.55% github.com/pingcap/tidb/store/tikv.(*twoPhaseCommitter).commitSingleBatch 0.02s 0.0055% 85.34% 1.99s 0.55% google.golang.org/grpc.newClientStream 0.02s 0.0055% 85.34% 2.55s 0.7% net.(*netFD).Write 0.01s 0.0027% 85.34% 6.05s 1.66% github.com/pingcap/tidb/executor.(*ExecStmt).handleNoDelayExecutor 0.01s 0.0027% 85.35% 3.51s 0.97% github.com/pingcap/tidb/executor.(*UpdateExec).fetchChunkRows 0.01s 0.0027% 85.35% 1.98s 0.54% github.com/pingcap/tidb/planner/core.(*planBuilder).buildUpdate 0.01s 0.0027% 85.35% 18s 4.95% github.com/pingcap/tidb/server.(*clientConn).handleQuery 0.01s 0.0027% 85.36% 11.26s 3.10% github.com/pingcap/tidb/session.(*session).executeStatement 0.01s 0.0027% 85.36% 4.59s 1.26% github.com/pingcap/tidb/store/tikv.(*rpcClient).SendRequest 0.01s 0.0027% 85.36% 2s 0.55% github.com/pingcap/tidb/store/tikv.(*twoPhaseCommitter).commitSingleBatch-fm 0.01s 0.0027% 85.36% 4.21s 1.16% github.com/pingcap/tidb/store/tikv.(*twoPhaseCommitter).doActionOnBatches.func1 0.01s 0.0027% 85.37% 3.70s 1.02% google.golang.org/grpc.(*ClientConn).Invoke 0.01s 0.0027% 85.37% 3.71s 1.02% google.golang.org/grpc.Invoke 0.01s 0.0027% 85.37% 3.30s 0.91% google.golang.org/grpc.invoke 0.01s 0.0027% 85.37% 1.98s 0.54% google.golang.org/grpc/transport.(*bufWriter).Flush 0.01s 0.0027% 85.38% 2.67s 0.73% internal/poll.(*FD).Write 0.01s 0.0027% 85.38% 2.26s 0.62% io.ReadFull 0.01s 0.0027% 85.38% 2.56s 0.7% net.(*conn).Write 0.01s 0.0027% 85.39% 3.35s 0.92% runtime.(*mheap).reclaim 0.01s 0.0027% 85.39% 4.60s 1.26% runtime.gcAssistAlloc 0.01s 0.0027% 85.39% 4.59s 1.26% runtime.gcAssistAlloc1 0.01s 0.0027% 85.39% 2.18s 0.6% runtime.mallocgc.func1 0 0% 85.39% 3.69s 1.01% github.com/grpc-ecosystem/go-grpc-prometheus.UnaryClientInterceptor 0 0% 85.39% 2.15s 0.59% github.com/pingcap/parser.(*Parser).Parse 0 0% 85.39% 3.16s 0.87% github.com/pingcap/tidb/executor.(*Compiler).Compile 0 0% 85.39% 8.28s 2.28% github.com/pingcap/tidb/executor.(*ExecStmt).Exec 0 0% 85.39% 6.68s 1.84% github.com/pingcap/tidb/executor.(*HashJoinExec).fetchOuterAndProbeHashTable.func1 0 0% 85.39% 163.02s 44.82% github.com/pingcap/tidb/executor.(*HashJoinExec).fetchOuterAndProbeHashTable.func2 0 0% 85.39% 4.03s 1.11% github.com/pingcap/tidb/executor.(*UpdateExec).Next 0 0% 85.39% 2.03s 0.56% github.com/pingcap/tidb/executor.(*baseExecutor).newFirstChunk 0 0% 85.39% 2.67s 0.73% github.com/pingcap/tidb/planner/core.(*planBuilder).build 0 0% 85.39% 2.92s 0.8% github.com/pingcap/tidb/planner/core.Optimize 0 0% 85.39% 18.96s 5.21% github.com/pingcap/tidb/server.(*Server).onConn 0 0% 85.39% 18.20s 5.00% github.com/pingcap/tidb/server.(*clientConn).dispatch 0 0% 85.39% 2.47s 0.68% github.com/pingcap/tidb/session.(*TxnState).Commit 0 0% 85.39% 2.54s 0.7% github.com/pingcap/tidb/session.(*session).CommitTxn 0 0% 85.39% 17.16s 4.72% github.com/pingcap/tidb/session.(*session).Execute 0 0% 85.39% 2.49s 0.68% github.com/pingcap/tidb/session.(*session).doCommit 0 0% 85.39% 2.53s 0.7% github.com/pingcap/tidb/session.(*session).doCommitWithRetry 0 0% 85.39% 2.54s 0.7% github.com/pingcap/tidb/session.finishStmt 0 0% 85.39% 11.16s 3.07% github.com/pingcap/tidb/session.runStmt 0 0% 85.39% 4.60s 1.26% github.com/pingcap/tidb/store/tikv.(*RegionRequestSender).sendReqToRegion 0 0% 85.39% 1.87s 0.51% github.com/pingcap/tidb/store/tikv.(*tikvStore).SendReq 0 0% 85.39% 2.46s 0.68% github.com/pingcap/tidb/store/tikv.(*tikvTxn).Commit 0 0% 85.39% 1.96s 0.54% github.com/pingcap/tidb/store/tikv.(*twoPhaseCommitter).doActionOnKeys 0 0% 85.39% 2.04s 0.56% github.com/pingcap/tidb/store/tikv.(*twoPhaseCommitter).execute 0 0% 85.39% 2.05s 0.56% github.com/pingcap/tidb/store/tikv.(*twoPhaseCommitter).executeAndWriteFinishBinlog 0 0% 85.39% 1.92s 0.53% github.com/pingcap/tidb/store/tikv.(*twoPhaseCommitter).prewriteSingleBatch 0 0% 85.39% 1.92s 0.53% github.com/pingcap/tidb/store/tikv.(*twoPhaseCommitter).prewriteSingleBatch-fm 0 0% 85.39% 169.70s 46.66% github.com/pingcap/tidb/util.WithRecovery 0 0% 85.39% 2.04s 0.56% github.com/pingcap/tidb/util/chunk.New 0 0% 85.39% 2.45s 0.67% github.com/pingcap/tidb/util/chunk.newFixedLenColumn (inline) 0 0% 85.39% 3.83s 1.05% google.golang.org/grpc/transport.newHTTP2Client.func3 0 0% 85.39% 4.85s 1.33% runtime.(*mheap).alloc.func1 0 0% 85.39% 3.13s 0.86% runtime.bgsweep 0 0% 85.39% 4.59s 1.26% runtime.gcAssistAlloc.func1 0 0% 85.39% 15.21s 4.18% runtime.gcBgMarkWorker 0 0% 85.39% 15.14s 4.16% runtime.gcBgMarkWorker.func2 0 0% 85.39% 3.13s 0.86% runtime.gosweepone 0 0% 85.39% 3.13s 0.86% runtime.gosweepone.func1 0 0% 85.39% 2.17s 0.6% runtime.largeAlloc 0 0% 85.39% 5.87s 1.61% runtime.wbBufFlush 0 0% 85.39% 5.87s 1.61% runtime.wbBufFlush.func1 0 0% 85.39% 2.56s 0.7% syscall.Write 0 0% 85.39% 2.56s 0.7% syscall.write

麻烦上传一下生成的/root/pprof/pprof.tidb-server.samples.cpu.001.pb.gz文件,文本信息不太好查看

pprof.tidb-server.samples.cpu.002.pb.gz (154.8 KB)

pprof.tidb-server.samples.cpu.001.pb.gz (166.1 KB)

一共两份,昨天和今天的两份。请大佬查看。

hi 方便帮忙再抓下 goroutine 下 thx

curl http://0.0.0.0:10080/debug/pprof/goroutine?debug=2 --output stack.txt

curl “localhost:10080/debug/zip?seconds=10” -o debug.zip

用这句来抓吧 一下子全能抓下来。

另外 tidb关于这个sql的日志也麻烦发一下。

先在tidb.log里grep "kill"的日志,可以找到类似如下的一行 [2019/10/26 23:08:18.803 +08:00] [INFO] [server.go:518] [kill] [connID=9705638] [query=false]

然后在按照connID来grep,就可以把这个connID的所有日志搜到 然后发上来吧。多谢。

debug.zip (1.9 MB)

1031.txt (659 字节)

请大神查看

日志感觉有些不对。

[2019/10/26 23:08:18.803 +08:00] [INFO] [server.go:518] [kill] [connID=9705638] [query=false]

要找这样的kill的日志, 然后看到connID=9705638(我这里是打个比方), 就去搜9705638

好的,收到,请问DEBUG日志是对的吧?因为刚才我并没有找到我自己KILL掉的日志,只找到了原始的日志,然后发给你了。您请稍等。我再找找

就是一般的tidb.log