青梅论IT
(青梅)
1
10月29号晚上,开发的同学,在TIDB的线上运行了几次有关于sum的操作,导致CPU直线飙升。首先需要承认的是,他们的这种做法欠妥当。在第一次没运行完SQL的前提下,又执行了几次。
后来接到了zabbix告警后(CPU超出设定的阈值),用kill tidb XXX进程 的方式将SQL的进程给KILL掉。但是一直到现在,已经过去很久,但是CPU利用率真还是一直没有下降的趋势。如图:
请问大佬,这个现象该如何排查以及解决。在此谢谢大佬。
我的TIDB-server服务器的 CPU:24核,内存64GB。TIDB的版本是2.1.8
请大佬们能够指教。谢谢。
不懂就问
(zhouyueyue)
2
麻烦执行下:go tool pprof http://{TiDBIP}:10080/debug/pprof/profile?seconds=30s 拿下信息
不懂就问
(zhouyueyue)
4
这个需要给下生成的文件信息,看下具体哪块在消耗 CPU 资源。
青梅论IT
(青梅)
5
你好,请问需要提供哪方面的文件信息以及获取这些文件信息的方式。其实是这样的。原本资源使用率是不高的,但是昨天晚上跑了几条sum统计的sql,一下子就飙升啦,后来kill掉正在执行的这些慢sql后,业务没有什么慢语句在执行,原本以cpu会降下来,但是却一直居高不下。所以我是非常的奇怪。按道理说,结束啦执行慢的语句后,cpu肯定会降下来的,至少它也会慢慢降下来,但是如您所看的那张监控图,丝毫没有降下来的意思。
青梅论IT
(青梅)
6
Duration: 30.17s, Total samples = 363.69s (1205.40%)
Showing nodes accounting for 310.57s, 85.39% of 363.69s total
Dropped 1464 nodes (cum <= 1.82s)
flat flat% sum% cum cum%
59.34s 16.32% 16.32% 119.15s 32.76% github.com/pingcap/tidb/util/chunk.(*Chunk).AppendPartialRow
26.78s 7.36% 23.68% 26.78s 7.36% runtime.memmove
25.56s 7.03% 30.71% 25.56s 7.03% github.com/pingcap/tidb/util/chunk.(*column).appendNullBitmap (inline)
15.12s 4.16% 34.86% 50.10s 13.78% runtime.mallocgc
14.93s 4.11% 38.97% 115.07s 31.64% github.com/pingcap/tidb/executor.(*HashAggPartialWorker).updatePartialResult
13.02s 3.58% 42.55% 13.02s 3.58% github.com/pingcap/tidb/util/chunk.(*column).isNull (inline)
9.33s 2.57% 45.12% 11.75s 3.23% runtime.lock
8.71s 2.39% 47.51% 10.06s 2.77% runtime.findObject
8.01s 2.20% 49.71% 12.80s 3.52% runtime.mapaccess2_faststr
7.50s 2.06% 51.77% 7.50s 2.06% github.com/pingcap/tidb/executor/aggfuncs.(*firstRow4String).UpdatePartialResult
7.28s 2.00% 53.78% 19.26s 5.30% runtime.scanobject
5.88s 1.62% 55.39% 5.88s 1.62% runtime.nextFreeFast (inline)
5.74s 1.58% 56.97% 6.96s 1.91% runtime.unlock
5.19s 1.43% 58.40% 10.05s 2.76% github.com/pingcap/tidb/util/chunk.Row.GetDatum
4.94s 1.36% 59.76% 6.83s 1.88% runtime.heapBitsSetType
4.27s 1.17% 60.93% 28.44s 7.82% github.com/pingcap/tidb/executor.(*HashAggPartialWorker).getGroupKey
4.01s 1.10% 62.03% 9.96s 2.74% github.com/pingcap/tidb/executor/aggfuncs.(*sum4Decimal).UpdatePartialResult
3.83s 1.05% 63.09% 3.83s 1.05% runtime.memclrNoHeapPointers
3.79s 1.04% 64.13% 13.64s 3.75% runtime.selectgo
3.72s 1.02% 65.15% 3.90s 1.07% syscall.Syscall
3.60s 0.99% 66.14% 3.60s 0.99% memeqbody
3.51s 0.97% 67.11% 44.92s 12.35% runtime.newobject
3.20s 0.88% 67.99% 69.37s 19.07% github.com/pingcap/tidb/executor.(*innerJoiner).tryToMatch
3.09s 0.85% 68.84% 3.09s 0.85% runtime.epollwait
2.77s 0.76% 69.60% 2.77s 0.76% cmpbody
2.67s 0.73% 70.33% 2.67s 0.73% runtime.futex
2.59s 0.71% 71.04% 2.59s 0.71% runtime.markBits.isMarked (inline)
2.58s 0.71% 71.75% 38.52s 10.59% github.com/pingcap/tidb/util/chunk.(*Chunk).AppendRow
2.51s 0.69% 72.44% 2.51s 0.69% github.com/pingcap/tidb/util/chunk.(*column).reset (inline)
2.14s 0.59% 73.03% 2.16s 0.59% runtime.(*waitq).dequeue
2.10s 0.58% 73.61% 2.10s 0.58% runtime.casgstatus
2.06s 0.57% 74.18% 12.04s 3.31% github.com/pingcap/tidb/expression.(*Column).Eval
1.97s 0.54% 74.72% 4.46s 1.23% runtime.greyobject
1.91s 0.53% 75.24% 63.94s 17.58% github.com/pingcap/tidb/executor.(*baseJoiner).makeJoinRowToChunk
1.87s 0.51% 75.76% 8.51s 2.34% github.com/pingcap/tidb/util/codec.encodeBytes
1.67s 0.46% 76.22% 6.44s 1.77% runtime.sweepone
1.65s 0.45% 76.67% 2.40s 0.66% github.com/pingcap/tidb/types.doAdd
1.51s 0.42% 77.08% 10.48s 2.88% github.com/pingcap/tidb/util/codec.encode
1.50s 0.41% 77.50% 157.50s 43.31% github.com/pingcap/tidb/executor.(*HashJoinExec).join2Chunk
1.48s 0.41% 77.90% 6.53s 1.80% runtime.gcWriteBarrier
1.45s 0.4% 78.30% 1.87s 0.51% runtime.runqgrab
1.31s 0.36% 78.66% 1.92s 0.53% runtime.heapBitsForAddr (inline)
1.15s 0.32% 78.98% 5.87s 1.61% runtime.wbBufFlush1
1.06s 0.29% 79.27% 2.93s 0.81% github.com/pingcap/tidb/util/codec.EncodeVarint
1.03s 0.28% 79.55% 5.08s 1.40% github.com/pingcap/tidb/expression.(*Column).EvalString
1.02s 0.28% 79.83% 6.86s 1.89% runtime.findrunnable
0.98s 0.27% 80.10% 10.99s 3.02% github.com/pingcap/tidb/expression.VectorizedFilter
0.88s 0.24% 80.35% 13.57s 3.73% github.com/pingcap/tidb/executor.baseHashAggWorker.getPartialResult
0.82s 0.23% 80.57% 8.88s 2.44% github.com/pingcap/tidb/expression.(*builtinEQStringSig).evalInt
0.82s 0.23% 80.80% 6.64s 1.83% github.com/pingcap/tidb/util/codec.EncodeCompactBytes
0.80s 0.22% 81.02% 4.86s 1.34% runtime.sellock
0.78s 0.21% 81.23% 1.92s 0.53% github.com/pingcap/tidb/util/mvmap.(*MVMap).Get
0.75s 0.21% 81.44% 7.44s 2.05% github.com/pingcap/tidb/executor.(*HashAggExec).fetchChildData
0.75s 0.21% 81.64% 7.09s 1.95% github.com/pingcap/tidb/util/chunk.Row.IsNull (inline)
0.68s 0.19% 81.83% 60.49s 16.63% github.com/pingcap/tidb/executor.(*leftOuterJoiner).onMissMatch
0.61s 0.17% 82.00% 2.97s 0.82% runtime.(*mcentral).freeSpan
0.60s 0.16% 82.16% 3.11s 0.86% github.com/pingcap/tidb/util/chunk.(*Chunk).Reset
0.58s 0.16% 82.32% 7.94s 2.18% github.com/pingcap/tidb/expression.CompareString
0.56s 0.15% 82.48% 1.86s 0.51% runtime.ready
0.54s 0.15% 82.63% 4.89s 1.34% runtime.(*mspan).sweep
0.53s 0.15% 82.77% 79.43s 21.84% github.com/pingcap/tidb/executor.(*HashJoinExec).joinMatchedOuterRow2Chunk
0.51s 0.14% 82.91% 2.28s 0.63% runtime.chanrecv
0.48s 0.13% 83.04% 2.03s 0.56% github.com/pingcap/parser.yyParse
0.47s 0.13% 83.17% 7.65s 2.10% github.com/pingcap/tidb/executor.(*HashJoinExec).Next
0.46s 0.13% 83.30% 8.86s 2.44% runtime.schedule
0.44s 0.12% 83.42% 6.99s 1.92% github.com/pingcap/tidb/executor.(*HashJoinExec).getNewJoinResult
0.44s 0.12% 83.54% 4.07s 1.12% github.com/pingcap/tidb/types.DecimalAdd
0.44s 0.12% 83.66% 11.83s 3.25% runtime.mcall
0.42s 0.12% 83.78% 8.78s 2.41% runtime.(*mcentral).cacheSpan
0.40s 0.11% 83.89% 10.81s 2.97% github.com/pingcap/tidb/util/codec.EncodeValue
0.37s 0.1% 83.99% 9.26s 2.55% github.com/pingcap/tidb/expression.(*ScalarFunction).EvalInt
0.35s 0.096% 84.09% 3.95s 1.09% github.com/pingcap/tidb/executor.(*HashAggPartialWorker).getChildInput
0.34s 0.093% 84.18% 6.68s 1.84% github.com/pingcap/tidb/executor.(*HashJoinExec).fetchOuterChunks
0.31s 0.085% 84.26% 163.02s 44.82% github.com/pingcap/tidb/executor.(*HashJoinExec).runJoinWorker
0.28s 0.077% 84.34% 10.59s 2.91% runtime.chansend
0.25s 0.069% 84.41% 2.29s 0.63% github.com/pingcap/tidb/types.CompareString (inline)
0.25s 0.069% 84.48% 2.08s 0.57% runtime.selunlock
0.24s 0.066% 84.54% 4.85s 1.33% runtime.(*mheap).alloc_m
0.22s 0.06% 84.61% 3.37s 0.93% runtime.netpoll
0.21s 0.058% 84.66% 119.23s 32.78% github.com/pingcap/tidb/executor.(*HashAggPartialWorker).run
0.18s 0.049% 84.71% 15.16s 4.17% runtime.gcDrain
0.16s 0.044% 84.76% 1.88s 0.52% github.com/pingcap/tidb/expression.(*Column).EvalDecimal
0.16s 0.044% 84.80% 2.03s 0.56% runtime.runqsteal
0.16s 0.044% 84.84% 1.99s 0.55% runtime.typedmemmove
0.15s 0.041% 84.89% 9.33s 2.57% runtime.(*mcache).nextFree
0.12s 0.033% 84.92% 3.83s 1.05% google.golang.org/grpc/transport.(*loopyWriter).run
0.11s 0.03% 84.95% 3.11s 0.86% golang.org/x/net/http2.(*Framer).ReadFrame
0.11s 0.03% 84.98% 7.10s 1.95% runtime.makeslice
0.11s 0.03% 85.01% 10.39s 2.86% runtime.park_m
0.11s 0.03% 85.04% 42.25s 11.62% runtime.systemstack
0.10s 0.027% 85.07% 4.58s 1.26% google.golang.org/grpc/transport.(*http2Client).reader
0.10s 0.027% 85.09% 1.96s 0.54% runtime.goready.func1
0.09s 0.025% 85.12% 2.27s 0.62% runtime.chanrecv2
0.08s 0.022% 85.14% 2.25s 0.62% io.ReadAtLeast
0.08s 0.022% 85.16% 3.70s 1.02% runtime.send
0.07s 0.019% 85.18% 1.98s 0.54% bufio.(*Reader).Read
0.07s 0.019% 85.20% 10.30s 2.83% runtime.chansend1
0.06s 0.016% 85.22% 8.87s 2.44% runtime.(*mcache).nextFree.func1
0.05s 0.014% 85.23% 1.82s 0.5% net.(*conn).Read
0.05s 0.014% 85.25% 5.01s 1.38% runtime.(*mcentral).grow
0.05s 0.014% 85.26% 4.58s 1.26% runtime.gcDrainN
0.03s 0.0082% 85.27% 17.19s 4.73% github.com/pingcap/tidb/server.(*TiDBContext).Execute
0.03s 0.0082% 85.28% 17.16s 4.72% github.com/pingcap/tidb/session.(*session).execute
0.03s 0.0082% 85.28% 3.78s 1.04% github.com/pingcap/tidb/store/tikv/tikvrpc.CallRPC
0.03s 0.0082% 85.29% 8.81s 2.42% runtime.(*mcache).refill
0.03s 0.0082% 85.30% 6.67s 1.83% runtime.(*mheap).alloc
0.03s 0.0082% 85.31% 1.99s 0.55% runtime.goready
0.02s 0.0055% 85.31% 18.94s 5.21% github.com/pingcap/tidb/server.(*clientConn).Run
0.02s 0.0055% 85.32% 2.17s 0.6% github.com/pingcap/tidb/session.(*session).ParseSQL
0.02s 0.0055% 85.33% 4.74s 1.30% github.com/pingcap/tidb/store/tikv.(*RegionRequestSender).SendReq
0.02s 0.0055% 85.33% 1.99s 0.55% github.com/pingcap/tidb/store/tikv.(*twoPhaseCommitter).commitSingleBatch
0.02s 0.0055% 85.34% 1.99s 0.55% google.golang.org/grpc.newClientStream
0.02s 0.0055% 85.34% 2.55s 0.7% net.(*netFD).Write
0.01s 0.0027% 85.34% 6.05s 1.66% github.com/pingcap/tidb/executor.(*ExecStmt).handleNoDelayExecutor
0.01s 0.0027% 85.35% 3.51s 0.97% github.com/pingcap/tidb/executor.(*UpdateExec).fetchChunkRows
0.01s 0.0027% 85.35% 1.98s 0.54% github.com/pingcap/tidb/planner/core.(*planBuilder).buildUpdate
0.01s 0.0027% 85.35% 18s 4.95% github.com/pingcap/tidb/server.(*clientConn).handleQuery
0.01s 0.0027% 85.36% 11.26s 3.10% github.com/pingcap/tidb/session.(*session).executeStatement
0.01s 0.0027% 85.36% 4.59s 1.26% github.com/pingcap/tidb/store/tikv.(*rpcClient).SendRequest
0.01s 0.0027% 85.36% 2s 0.55% github.com/pingcap/tidb/store/tikv.(*twoPhaseCommitter).commitSingleBatch-fm
0.01s 0.0027% 85.36% 4.21s 1.16% github.com/pingcap/tidb/store/tikv.(*twoPhaseCommitter).doActionOnBatches.func1
0.01s 0.0027% 85.37% 3.70s 1.02% google.golang.org/grpc.(*ClientConn).Invoke
0.01s 0.0027% 85.37% 3.71s 1.02% google.golang.org/grpc.Invoke
0.01s 0.0027% 85.37% 3.30s 0.91% google.golang.org/grpc.invoke
0.01s 0.0027% 85.37% 1.98s 0.54% google.golang.org/grpc/transport.(*bufWriter).Flush
0.01s 0.0027% 85.38% 2.67s 0.73% internal/poll.(*FD).Write
0.01s 0.0027% 85.38% 2.26s 0.62% io.ReadFull
0.01s 0.0027% 85.38% 2.56s 0.7% net.(*conn).Write
0.01s 0.0027% 85.39% 3.35s 0.92% runtime.(*mheap).reclaim
0.01s 0.0027% 85.39% 4.60s 1.26% runtime.gcAssistAlloc
0.01s 0.0027% 85.39% 4.59s 1.26% runtime.gcAssistAlloc1
0.01s 0.0027% 85.39% 2.18s 0.6% runtime.mallocgc.func1
0 0% 85.39% 3.69s 1.01% github.com/grpc-ecosystem/go-grpc-prometheus.UnaryClientInterceptor
0 0% 85.39% 2.15s 0.59% github.com/pingcap/parser.(*Parser).Parse
0 0% 85.39% 3.16s 0.87% github.com/pingcap/tidb/executor.(*Compiler).Compile
0 0% 85.39% 8.28s 2.28% github.com/pingcap/tidb/executor.(*ExecStmt).Exec
0 0% 85.39% 6.68s 1.84% github.com/pingcap/tidb/executor.(*HashJoinExec).fetchOuterAndProbeHashTable.func1
0 0% 85.39% 163.02s 44.82% github.com/pingcap/tidb/executor.(*HashJoinExec).fetchOuterAndProbeHashTable.func2
0 0% 85.39% 4.03s 1.11% github.com/pingcap/tidb/executor.(*UpdateExec).Next
0 0% 85.39% 2.03s 0.56% github.com/pingcap/tidb/executor.(*baseExecutor).newFirstChunk
0 0% 85.39% 2.67s 0.73% github.com/pingcap/tidb/planner/core.(*planBuilder).build
0 0% 85.39% 2.92s 0.8% github.com/pingcap/tidb/planner/core.Optimize
0 0% 85.39% 18.96s 5.21% github.com/pingcap/tidb/server.(*Server).onConn
0 0% 85.39% 18.20s 5.00% github.com/pingcap/tidb/server.(*clientConn).dispatch
0 0% 85.39% 2.47s 0.68% github.com/pingcap/tidb/session.(*TxnState).Commit
0 0% 85.39% 2.54s 0.7% github.com/pingcap/tidb/session.(*session).CommitTxn
0 0% 85.39% 17.16s 4.72% github.com/pingcap/tidb/session.(*session).Execute
0 0% 85.39% 2.49s 0.68% github.com/pingcap/tidb/session.(*session).doCommit
0 0% 85.39% 2.53s 0.7% github.com/pingcap/tidb/session.(*session).doCommitWithRetry
0 0% 85.39% 2.54s 0.7% github.com/pingcap/tidb/session.finishStmt
0 0% 85.39% 11.16s 3.07% github.com/pingcap/tidb/session.runStmt
0 0% 85.39% 4.60s 1.26% github.com/pingcap/tidb/store/tikv.(*RegionRequestSender).sendReqToRegion
0 0% 85.39% 1.87s 0.51% github.com/pingcap/tidb/store/tikv.(*tikvStore).SendReq
0 0% 85.39% 2.46s 0.68% github.com/pingcap/tidb/store/tikv.(*tikvTxn).Commit
0 0% 85.39% 1.96s 0.54% github.com/pingcap/tidb/store/tikv.(*twoPhaseCommitter).doActionOnKeys
0 0% 85.39% 2.04s 0.56% github.com/pingcap/tidb/store/tikv.(*twoPhaseCommitter).execute
0 0% 85.39% 2.05s 0.56% github.com/pingcap/tidb/store/tikv.(*twoPhaseCommitter).executeAndWriteFinishBinlog
0 0% 85.39% 1.92s 0.53% github.com/pingcap/tidb/store/tikv.(*twoPhaseCommitter).prewriteSingleBatch
0 0% 85.39% 1.92s 0.53% github.com/pingcap/tidb/store/tikv.(*twoPhaseCommitter).prewriteSingleBatch-fm
0 0% 85.39% 169.70s 46.66% github.com/pingcap/tidb/util.WithRecovery
0 0% 85.39% 2.04s 0.56% github.com/pingcap/tidb/util/chunk.New
0 0% 85.39% 2.45s 0.67% github.com/pingcap/tidb/util/chunk.newFixedLenColumn (inline)
0 0% 85.39% 3.83s 1.05% google.golang.org/grpc/transport.newHTTP2Client.func3
0 0% 85.39% 4.85s 1.33% runtime.(*mheap).alloc.func1
0 0% 85.39% 3.13s 0.86% runtime.bgsweep
0 0% 85.39% 4.59s 1.26% runtime.gcAssistAlloc.func1
0 0% 85.39% 15.21s 4.18% runtime.gcBgMarkWorker
0 0% 85.39% 15.14s 4.16% runtime.gcBgMarkWorker.func2
0 0% 85.39% 3.13s 0.86% runtime.gosweepone
0 0% 85.39% 3.13s 0.86% runtime.gosweepone.func1
0 0% 85.39% 2.17s 0.6% runtime.largeAlloc
0 0% 85.39% 5.87s 1.61% runtime.wbBufFlush
0 0% 85.39% 5.87s 1.61% runtime.wbBufFlush.func1
0 0% 85.39% 2.56s 0.7% syscall.Write
0 0% 85.39% 2.56s 0.7% syscall.write
麻烦上传一下生成的/root/pprof/pprof.tidb-server.samples.cpu.001.pb.gz文件,文本信息不太好查看
青梅论IT
(青梅)
9
青梅论IT
(青梅)
10
robi
(robi-PingCAP)
12
hi 方便帮忙再抓下 goroutine 下 thx
curl http://0.0.0.0:10080/debug/pprof/goroutine?debug=2 --output stack.txt
curl “localhost:10080/debug/zip?seconds=10” -o debug.zip
用这句来抓吧 一下子全能抓下来。
另外 tidb关于这个sql的日志也麻烦发一下。
先在tidb.log里grep "kill"的日志,可以找到类似如下的一行
[2019/10/26 23:08:18.803 +08:00] [INFO] [server.go:518] [kill] [connID=9705638] [query=false]
然后在按照connID来grep,就可以把这个connID的所有日志搜到 然后发上来吧。多谢。
日志感觉有些不对。
[2019/10/26 23:08:18.803 +08:00] [INFO] [server.go:518] [kill] [connID=9705638] [query=false]
要找这样的kill的日志,
然后看到connID=9705638(我这里是打个比方), 就去搜9705638
青梅论IT
(青梅)
19
好的,收到,请问DEBUG日志是对的吧?因为刚才我并没有找到我自己KILL掉的日志,只找到了原始的日志,然后发给你了。您请稍等。我再找找