Sort算子落盘性能太慢,希望产品层面进行优化

看官方文档已经对落盘的读取进行了优化:https://docs.pingcap.com/zh/tidb/stable/release-7.1.1#:~:text=优化与落盘相关的%20chunk%20读取的性能

对应的ISSUE:Implement a fast row container reader to dump rows from disk · Issue #45125 · pingcap/tidb · GitHub

可是相同的数据量在7.1.1版本中执行发现落盘后,效率还是很低下。


mysql> set tidb_distsql_scan_concurrency=1;
Query OK, 0 rows affected (0.00 sec)

mysql> set tidb_mem_quota_query=209715200;
Query OK, 0 rows affected (0.00 sec)

mysql> explain analyze select * from customer order by C_COMMENT desc, C_ACCTBAL asc;
+-------------------------+------------+---------+-----------+----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------+----------+----------+
| id                      | estRows    | actRows | task      | access object  | execution info                                                                                                                                                                                                                                                                                                                | operator info                                             | memory   | disk     |
+-------------------------+------------+---------+-----------+----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------+----------+----------+
| Sort_4                  | 1500000.00 | 1500000 | root      |                | time:34.6s, loops:1466, RU:5061.368437                                                                                                                                                                                                                                                                                        | tpch10.customer.c_comment:desc, tpch10.customer.c_acctbal | 189.3 MB | 204.9 MB |
| └─TableReader_8         | 1500000.00 | 1500000 | root      |                | time:1.52s, loops:1468, cop_task: {num: 54, max: 74.9ms, min: 817µs, avg: 29.1ms, p95: 63.2ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 1.17s, tot_wait: 8.07ms, rpc_num: 54, rpc_time: 1.57s, copr_cache_hit_ratio: 0.00, build_task_duration: 13.6µs, max_distsql_concurrency: 1}                              | data:TableFullScan_7                                      | 22.5 MB  | N/A      |
|   └─TableFullScan_7     | 1500000.00 | 1500000 | cop[tikv] | table:customer | tikv_task:{proc max:39ms, min:0s, avg: 13.1ms, p80:23ms, p95:32ms, iters:1678, tasks:54}, scan_detail: {total_process_keys: 1500000, total_process_keys_size: 305225771, total_keys: 1500054, get_snapshot_time: 1.24ms, rocksdb: {delete_skipped_count: 217344, key_skipped_count: 1717344, block: {cache_hit_count: 7743}}} | keep order:false                                          | N/A      | N/A      |
+-------------------------+------------+---------+-----------+----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------+----------+----------+
3 rows in set (34.58 sec)

mysql> select version();
+--------------------+
| version()          |
+--------------------+
| 5.7.25-TiDB-v7.1.1 |
+--------------------+
1 row in set (0.00 sec)

看pprof像是GC的问题:

(pprof) top
Showing nodes accounting for 5780ms, 60.97% of 9480ms total
Dropped 256 nodes (cum <= 47.40ms)
Showing top 10 nodes out of 141
      flat  flat%   sum%        cum   cum%
    2400ms 25.32% 25.32%     2400ms 25.32%  runtime.memclrNoHeapPointers
    1210ms 12.76% 38.08%     1210ms 12.76%  runtime/internal/syscall.Syscall6
     490ms  5.17% 43.25%      490ms  5.17%  runtime.(*fixalloc).alloc
     430ms  4.54% 47.78%      940ms  9.92%  runtime.sweepone
     260ms  2.74% 50.53%      280ms  2.95%  runtime.findObject
     260ms  2.74% 53.27%     4880ms 51.48%  runtime.mallocgc
     210ms  2.22% 55.49%      800ms  8.44%  runtime.scanobject
     180ms  1.90% 57.38%      200ms  2.11%  runtime.writeHeapBits.flush
     170ms  1.79% 59.18%      170ms  1.79%  hash/crc32.ieeeCLMUL
     170ms  1.79% 60.97%      170ms  1.79%  runtime.futex
(pprof) tree
Showing nodes accounting for 8.33s, 87.87% of 9.48s total
Dropped 256 nodes (cum <= 0.05s)
Showing top 80 nodes out of 141
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context              
----------------------------------------------------------+-------------
                                             2.40s   100% |   runtime.mallocgc
     2.40s 25.32% 25.32%      2.40s 25.32%                | runtime.memclrNoHeapPointers
----------------------------------------------------------+-------------
                                             1.18s 97.52% |   syscall.RawSyscall6
     1.21s 12.76% 38.08%      1.21s 12.76%                | runtime/internal/syscall.Syscall6
----------------------------------------------------------+-------------
                                             0.46s 93.88% |   runtime.(*mheap).allocSpan
                                             0.03s  6.12% |   runtime.systemstack
     0.49s  5.17% 43.25%      0.49s  5.17%                | runtime.(*fixalloc).alloc
----------------------------------------------------------+-------------
                                             0.94s   100% |   runtime.bgsweep
     0.43s  4.54% 47.78%      0.94s  9.92%                | runtime.sweepone
                                             0.41s 43.62% |   runtime.(*sweepLocked).sweep
                                             0.03s  3.19% |   runtime.(*spanSet).pop
                                             0.03s  3.19% |   runtime.(*sweepLocker).tryAcquire
----------------------------------------------------------+-------------
                                             0.28s   100% |   runtime.scanobject
     0.26s  2.74% 50.53%      0.28s  2.95%                | runtime.findObject
----------------------------------------------------------+-------------
                                             4.54s 93.03% |   runtime.makeslice
                                             0.27s  5.53% |   runtime.newobject
                                             0.07s  1.43% |   github.com/pingcap/tidb/util/checksum.(*Reader).ReadAt
     0.26s  2.74% 53.27%      4.88s 51.48%                | runtime.mallocgc
                                             2.40s 49.18% |   runtime.memclrNoHeapPointers
                                             1.19s 24.39% |   runtime.(*mcache).nextFree
                                             0.43s  8.81% |   runtime.mProf_Malloc
                                             0.28s  5.74% |   runtime.(*mcache).allocLarge
                                             0.10s  2.05% |   runtime.heapBitsSetType
                                             0.10s  2.05% |   runtime.nextFreeFast (inline)
                                             0.03s  0.61% |   runtime.systemstack
----------------------------------------------------------+-------------
                                             0.79s 98.75% |   runtime.gcDrain
                                             0.01s  1.25% |   runtime.systemstack
     0.21s  2.22% 55.49%      0.80s  8.44%                | runtime.scanobject
                                             0.28s 35.00% |   runtime.findObject
                                             0.19s 23.75% |   runtime.greyobject
                                             0.08s 10.00% |   runtime.heapBitsForAddr
----------------------------------------------------------+-------------
                                             0.11s 55.00% |   runtime.(*mcentral).cacheSpan
                                             0.06s 30.00% |   runtime.(*mcache).allocLarge
                                             0.03s 15.00% |   runtime.heapBitsSetType
     0.18s  1.90% 57.38%      0.20s  2.11%                | runtime.writeHeapBits.flush
----------------------------------------------------------+-------------
                                             0.17s   100% |   hash/crc32.archUpdateIEEE
     0.17s  1.79% 59.18%      0.17s  1.79%                | hash/crc32.ieeeCLMUL
----------------------------------------------------------+-------------
                                             0.11s 64.71% |   runtime.unlock2
                                             0.04s 23.53% |   runtime.systemstack
     0.17s  1.79% 60.97%      0.17s  1.79%                | runtime.futex
----------------------------------------------------------+-------------
                                             0.12s 92.31% |   github.com/pingcap/tidb/util/chunk.appendCellByCell
                                             0.01s  7.69% |   github.com/pingcap/tidb/util/checksum.(*Reader).ReadAt
     0.13s  1.37% 62.34%      0.13s  1.37%                | runtime.memmove
----------------------------------------------------------+-------------
                                             0.05s 41.67% |   runtime.(*consistentHeapStats).acquire (inline)
                                             0.05s 41.67% |   runtime.(*consistentHeapStats).release (inline)
                                             0.02s 16.67% |   runtime.(*spanSet).pop (inline)
     0.12s  1.27% 63.61%      0.12s  1.27%                | runtime/internal/atomic.(*Uint32).Add
----------------------------------------------------------+-------------
                                             0.10s   100% |   github.com/pingcap/tidb/util/chunk.genCmpStringFunc.func1 (inline)
     0.10s  1.05% 64.66%      0.10s  1.05%                | github.com/pingcap/tidb/util/chunk.(*Column).IsNull
----------------------------------------------------------+-------------
                                             0.22s   100% |   github.com/pingcap/tidb/util/chunk.(*Chunk).AppendPartialRow
     0.10s  1.05% 65.72%      0.22s  2.32%                | github.com/pingcap/tidb/util/chunk.appendCellByCell
                                             0.12s 54.55% |   runtime.memmove
----------------------------------------------------------+-------------
                                             0.41s 97.62% |   runtime.sweepone
                                             0.01s  2.38% |   runtime.(*mcentral).cacheSpan
     0.10s  1.05% 66.77%      0.42s  4.43%                | runtime.(*sweepLocked).sweep
                                             0.24s 57.14% |   runtime.systemstack
                                             0.02s  4.76% |   runtime.unlock2
                                             0.01s  2.38% |   runtime.(*consistentHeapStats).acquire
                                             0.01s  2.38% |   runtime.(*consistentHeapStats).release
----------------------------------------------------------+-------------
                                             0.10s   100% |   runtime.mallocgc (inline)
     0.10s  1.05% 67.83%      0.10s  1.05%                | runtime.nextFreeFast
----------------------------------------------------------+-------------
                                             0.15s 57.69% |   runtime.pcdatavalue
                                             0.11s 42.31% |   runtime.gentraceback
     0.10s  1.05% 68.88%      0.26s  2.74%                | runtime.pcvalue
                                             0.12s 46.15% |   runtime.step
----------------------------------------------------------+-------------
                                             0.12s   100% |   runtime.pcvalue
     0.10s  1.05% 69.94%      0.12s  1.27%                | runtime.step
----------------------------------------------------------+-------------
                                             0.07s 87.50% |   runtime.(*mheap).freeSpanLocked
                                             0.01s 12.50% |   runtime.(*mheap).allocSpan
     0.08s  0.84% 70.78%      0.08s  0.84%                | runtime.(*pallocBits).summarize
----------------------------------------------------------+-------------
                                             0.07s   100% |   github.com/pingcap/tidb/types.CompareString (inline)
     0.07s  0.74% 71.52%      0.07s  0.74%                | github.com/pingcap/tidb/util/collate.truncateTailingSpace
----------------------------------------------------------+-------------
                                             0.09s   100% |   runtime.(*mcache).refill
     0.07s  0.74% 72.26%      0.09s  0.95%                | runtime.(*spanSet).push
----------------------------------------------------------+-------------
                                             0.04s 57.14% |   runtime.(*mcentral).cacheSpan
                                             0.03s 42.86% |   runtime.sweepone
     0.07s  0.74% 73.00%      0.07s  0.74%                | runtime.(*sweepLocker).tryAcquire
----------------------------------------------------------+-------------
                                             0.08s   100% |   runtime.scanobject
     0.07s  0.74% 73.73%      0.08s  0.84%                | runtime.heapBitsForAddr
----------------------------------------------------------+-------------
                                             0.07s 87.50% |   runtime.greyobject (inline)
                                             0.01s 12.50% |   runtime.(*mheap).freeSpanLocked (inline)
     0.07s  0.74% 74.47%      0.08s  0.84%                | runtime.pageIndexOf
----------------------------------------------------------+-------------
                                             0.06s   100% |   runtime.(*mcache).refill (inline)
     0.06s  0.63% 75.11%      0.06s  0.63%                | runtime.(*mcentral).partialSwept
----------------------------------------------------------+-------------
                                             0.06s   100% |   runtime.(*mheap).initSpan
     0.06s  0.63% 75.74%      0.06s  0.63%                | runtime.(*mspan).init
----------------------------------------------------------+-------------
                                             0.19s   100% |   runtime.scanobject
     0.06s  0.63% 76.37%      0.19s  2.00%                | runtime.greyobject
                                             0.07s 36.84% |   runtime.pageIndexOf (inline)
                                             0.05s 26.32% |   runtime.(*gcWork).putFast (inline)
----------------------------------------------------------+-------------
                                             1.92s   100% |   io.(*SectionReader).Read
     0.05s  0.53% 76.90%      1.92s 20.25%                | github.com/pingcap/tidb/util/checksum.(*Reader).ReadAt
                                             1.49s 77.60% |   os.(*File).ReadAt
                                             0.26s 13.54% |   hash/crc32.update
                                             0.07s  3.65% |   runtime.mallocgc
                                             0.01s  0.52% |   runtime.memmove
----------------------------------------------------------+-------------
                                             0.05s   100% |   hash/crc32.archUpdateIEEE (inline)
     0.05s  0.53% 77.43%      0.05s  0.53%                | hash/crc32.simpleUpdate
----------------------------------------------------------+-------------
                                             0.05s   100% |   runtime.greyobject (inline)
     0.05s  0.53% 77.95%      0.05s  0.53%                | runtime.(*gcWork).putFast
----------------------------------------------------------+-------------
                                             1.15s   100% |   runtime.(*mcache).nextFree
     0.05s  0.53% 78.48%      1.15s 12.13%                | runtime.(*mcache).refill
                                             0.87s 75.65% |   runtime.(*mcentral).cacheSpan
                                             0.09s  7.83% |   runtime.(*spanSet).push
                                             0.06s  5.22% |   runtime.(*mcentral).partialSwept (inline)
                                             0.03s  2.61% |   runtime.(*consistentHeapStats).release
                                             0.01s  0.87% |   runtime.(*consistentHeapStats).acquire
----------------------------------------------------------+-------------
                                             0.21s   100% |   runtime.systemstack
     0.05s  0.53% 79.01%      0.21s  2.22%                | runtime.(*mheap).freeSpanLocked
                                             0.07s 33.33% |   runtime.(*pallocBits).summarize
                                             0.02s  9.52% |   runtime.(*consistentHeapStats).acquire
                                             0.02s  9.52% |   runtime.(*consistentHeapStats).release
                                             0.01s  4.76% |   runtime.pageIndexOf (inline)
----------------------------------------------------------+-------------
                                             0.16s   100% |   runtime.(*mheap).allocSpan
     0.05s  0.53% 79.54%      0.16s  1.69%                | runtime.(*mheap).initSpan
                                             0.06s 37.50% |   runtime.(*mspan).init
----------------------------------------------------------+-------------
                                             1.96s   100% |   io.ReadAtLeast
     0.04s  0.42% 79.96%      1.96s 20.68%                | io.(*SectionReader).Read
                                             1.92s 97.96% |   github.com/pingcap/tidb/util/checksum.(*Reader).ReadAt
----------------------------------------------------------+-------------
                                             1.49s   100% |   github.com/pingcap/tidb/util/checksum.(*Reader).ReadAt
     0.04s  0.42% 80.38%      1.49s 15.72%                | os.(*File).ReadAt
                                             1.45s 97.32% |   os.(*File).pread (inline)
----------------------------------------------------------+-------------
                                             0.87s   100% |   runtime.(*mcache).refill
     0.04s  0.42% 80.80%      0.87s  9.18%                | runtime.(*mcentral).cacheSpan
                                             0.63s 72.41% |   runtime.(*mheap).alloc
                                             0.11s 12.64% |   runtime.writeHeapBits.flush
                                             0.04s  4.60% |   runtime.(*sweepLocker).tryAcquire
                                             0.02s  2.30% |   runtime.(*spanSet).pop
                                             0.01s  1.15% |   runtime.(*sweepLocked).sweep
----------------------------------------------------------+-------------
                                             0.80s   100% |   runtime.(*mheap).alloc.func1
     0.04s  0.42% 81.22%      0.80s  8.44%                | runtime.(*mheap).allocSpan
                                             0.46s 57.50% |   runtime.(*fixalloc).alloc
                                             0.16s 20.00% |   runtime.(*mheap).initSpan
                                             0.09s 11.25% |   runtime.unlock2
                                             0.01s  1.25% |   runtime.(*pallocBits).summarize
----------------------------------------------------------+-------------
                                             0.09s   100% |   syscall.Syscall6
     0.04s  0.42% 81.65%      0.09s  0.95%                | runtime.exitsyscall
----------------------------------------------------------+-------------
                                             0.36s   100% |   runtime.systemstack
     0.04s  0.42% 82.07%      0.36s  3.80%                | runtime.gentraceback
                                             0.17s 47.22% |   runtime.pcdatavalue
                                             0.11s 30.56% |   runtime.pcvalue
----------------------------------------------------------+-------------
                                             0.10s   100% |   runtime.mallocgc
     0.04s  0.42% 82.49%      0.10s  1.05%                | runtime.heapBitsSetType
                                             0.03s 30.00% |   runtime.writeHeapBits.flush
----------------------------------------------------------+-------------
                                             0.04s 50.00% |   runtime.systemstack
                                             0.01s 12.50% |   runtime.gcDrain
     0.04s  0.42% 82.91%      0.08s  0.84%                | runtime.lock2
----------------------------------------------------------+-------------
                                             4.70s   100% |   github.com/pingcap/tidb/executor.(*ExplainExec).executeAnalyzeExec
     0.03s  0.32% 83.23%      4.70s 49.58%                | github.com/pingcap/tidb/util/chunk.(*diskFormatRow).toRow
                                             2.72s 57.87% |   runtime.makeslice
                                             1.80s 38.30% |   github.com/pingcap/tidb/util/chunk.newFixedLenColumn (inline)
                                             0.14s  2.98% |   runtime.newobject
----------------------------------------------------------+-------------
                                             0.25s   100% |   hash/crc32.update
     0.03s  0.32% 83.54%      0.25s  2.64%                | hash/crc32.archUpdateIEEE
                                             0.17s 68.00% |   hash/crc32.ieeeCLMUL
                                             0.05s 20.00% |   hash/crc32.simpleUpdate (inline)
----------------------------------------------------------+-------------
                                             1.19s   100% |   runtime.mallocgc
     0.03s  0.32% 83.86%      1.19s 12.55%                | runtime.(*mcache).nextFree
                                             1.15s 96.64% |   runtime.(*mcache).refill
----------------------------------------------------------+-------------
                                             2.72s 59.52% |   github.com/pingcap/tidb/util/chunk.(*diskFormatRow).toRow
                                             1.72s 37.64% |   github.com/pingcap/tidb/util/chunk.newFixedLenColumn
                                             0.13s  2.84% |   github.com/pingcap/tidb/util/chunk.(*rowInDisk).ReadFrom
     0.03s  0.32% 84.18%      4.57s 48.21%                | runtime.makeslice
                                             4.54s 99.34% |   runtime.mallocgc
----------------------------------------------------------+-------------
                                             0.25s   100% |   github.com/pingcap/tidb/executor.(*multiWayMerge).Less
     0.02s  0.21% 84.39%      0.25s  2.64%                | github.com/pingcap/tidb/executor.(*SortExec).lessRow
                                             0.23s 92.00% |   github.com/pingcap/tidb/util/chunk.genCmpStringFunc.func1
----------------------------------------------------------+-------------
                                             0.03s 60.00% |   runtime.sweepone
                                             0.02s 40.00% |   runtime.(*mcentral).cacheSpan
     0.02s  0.21% 84.60%      0.05s  0.53%                | runtime.(*spanSet).pop
                                             0.02s 40.00% |   runtime/internal/atomic.(*Uint32).Add (inline)
----------------------------------------------------------+-------------
                                             0.17s   100% |   runtime.gentraceback
     0.02s  0.21% 84.81%      0.17s  1.79%                | runtime.pcdatavalue
                                             0.15s 88.24% |   runtime.pcvalue
----------------------------------------------------------+-------------
                                             1.36s   100% |   syscall.pread
     0.02s  0.21% 85.02%      1.36s 14.35%                | syscall.Syscall6
                                             1.17s 86.03% |   syscall.RawSyscall6
                                             0.09s  6.62% |   runtime.exitsyscall
                                             0.08s  5.88% |   runtime.entersyscall
----------------------------------------------------------+-------------
                                             1.38s   100% |   internal/poll.(*FD).Pread
     0.02s  0.21% 85.23%      1.38s 14.56%                | syscall.pread
                                             1.36s 98.55% |   syscall.Syscall6
----------------------------------------------------------+-------------
                                             0.27s   100% |   container/heap.Fix
     0.01s  0.11% 85.34%      0.27s  2.85%                | container/heap.down
                                             0.26s 96.30% |   github.com/pingcap/tidb/executor.(*multiWayMerge).Less
----------------------------------------------------------+-------------
                                             0.26s   100% |   container/heap.down
     0.01s  0.11% 85.44%      0.26s  2.74%                | github.com/pingcap/tidb/executor.(*multiWayMerge).Less
                                             0.25s 96.15% |   github.com/pingcap/tidb/executor.(*SortExec).lessRow
----------------------------------------------------------+-------------
                                             0.08s   100% |   github.com/pingcap/tidb/util/chunk.genCmpStringFunc.func1
     0.01s  0.11% 85.55%      0.08s  0.84%                | github.com/pingcap/tidb/types.CompareString
                                             0.07s 87.50% |   github.com/pingcap/tidb/util/collate.truncateTailingSpace (inline)
----------------------------------------------------------+-------------
                                             0.23s   100% |   github.com/pingcap/tidb/executor.(*ExplainExec).executeAnalyzeExec
     0.01s  0.11% 85.65%      0.23s  2.43%                | github.com/pingcap/tidb/util/chunk.(*Chunk).AppendPartialRow
                                             0.22s 95.65% |   github.com/pingcap/tidb/util/chunk.appendCellByCell
----------------------------------------------------------+-------------
                                             0.44s   100% |   github.com/pingcap/tidb/executor.(*ExplainExec).executeAnalyzeExec
     0.01s  0.11% 85.76%      0.44s  4.64%                | github.com/pingcap/tidb/util/chunk.(*ListInDisk).getOffset
                                             0.39s 88.64% |   io.ReadAtLeast
                                             0.04s  9.09% |   runtime.newobject
----------------------------------------------------------+-------------
                                             1.72s   100% |   github.com/pingcap/tidb/executor.(*ExplainExec).executeAnalyzeExec
     0.01s  0.11% 85.86%      1.72s 18.14%                | github.com/pingcap/tidb/util/chunk.(*rowInDisk).ReadFrom
                                             1.58s 91.86% |   io.ReadAtLeast
                                             0.13s  7.56% |   runtime.makeslice
----------------------------------------------------------+-------------
                                             0.23s   100% |   github.com/pingcap/tidb/executor.(*SortExec).lessRow
     0.01s  0.11% 85.97%      0.23s  2.43%                | github.com/pingcap/tidb/util/chunk.genCmpStringFunc.func1
                                             0.10s 43.48% |   github.com/pingcap/tidb/util/chunk.(*Column).IsNull (inline)
                                             0.08s 34.78% |   github.com/pingcap/tidb/types.CompareString
----------------------------------------------------------+-------------
                                             1.80s   100% |   github.com/pingcap/tidb/util/chunk.(*diskFormatRow).toRow (inline)
     0.01s  0.11% 86.08%      1.80s 18.99%                | github.com/pingcap/tidb/util/chunk.newFixedLenColumn
                                             1.72s 95.56% |   runtime.makeslice
                                             0.07s  3.89% |   runtime.newobject
----------------------------------------------------------+-------------
                                             0.26s   100% |   github.com/pingcap/tidb/util/checksum.(*Reader).ReadAt
     0.01s  0.11% 86.18%      0.26s  2.74%                | hash/crc32.update
                                             0.25s 96.15% |   hash/crc32.archUpdateIEEE
----------------------------------------------------------+-------------
                                             1.44s   100% |   os.(*File).pread
     0.01s  0.11% 86.29%      1.44s 15.19%                | internal/poll.(*FD).Pread
                                             1.38s 95.83% |   syscall.pread
----------------------------------------------------------+-------------
                                             1.58s 80.20% |   github.com/pingcap/tidb/util/chunk.(*rowInDisk).ReadFrom
                                             0.39s 19.80% |   github.com/pingcap/tidb/util/chunk.(*ListInDisk).getOffset
     0.01s  0.11% 86.39%      1.97s 20.78%                | io.ReadAtLeast
                                             1.96s 99.49% |   io.(*SectionReader).Read
----------------------------------------------------------+-------------
                                             1.45s   100% |   os.(*File).ReadAt (inline)
     0.01s  0.11% 86.50%      1.45s 15.30%                | os.(*File).pread
                                             1.44s 99.31% |   internal/poll.(*FD).Pread
----------------------------------------------------------+-------------
                                             0.02s 33.33% |   runtime.(*mcache).allocLarge
                                             0.02s 33.33% |   runtime.(*mheap).freeSpanLocked
                                             0.01s 16.67% |   runtime.(*mcache).refill
                                             0.01s 16.67% |   runtime.(*sweepLocked).sweep
     0.01s  0.11% 86.60%      0.06s  0.63%                | runtime.(*consistentHeapStats).acquire
                                             0.05s 83.33% |   runtime/internal/atomic.(*Uint32).Add (inline)
----------------------------------------------------------+-------------
                                             0.03s 50.00% |   runtime.(*mcache).refill
                                             0.02s 33.33% |   runtime.(*mheap).freeSpanLocked
                                             0.01s 16.67% |   runtime.(*sweepLocked).sweep
     0.01s  0.11% 86.71%      0.06s  0.63%                | runtime.(*consistentHeapStats).release
                                             0.05s 83.33% |   runtime/internal/atomic.(*Uint32).Add (inline)
----------------------------------------------------------+-------------
                                             0.28s   100% |   runtime.mallocgc
     0.01s  0.11% 86.81%      0.28s  2.95%                | runtime.(*mcache).allocLarge
                                             0.19s 67.86% |   runtime.(*mheap).alloc
                                             0.06s 21.43% |   runtime.writeHeapBits.flush
                                             0.02s  7.14% |   runtime.(*consistentHeapStats).acquire
----------------------------------------------------------+-------------
                                             0.63s 76.83% |   runtime.(*mcentral).cacheSpan
                                             0.19s 23.17% |   runtime.(*mcache).allocLarge
     0.01s  0.11% 86.92%      0.82s  8.65%                | runtime.(*mheap).alloc
                                             0.81s 98.78% |   runtime.systemstack
----------------------------------------------------------+-------------
                                             0.81s   100% |   runtime.systemstack
     0.01s  0.11% 87.03%      0.81s  8.54%                | runtime.(*mheap).alloc.func1
                                             0.80s 98.77% |   runtime.(*mheap).allocSpan
----------------------------------------------------------+-------------
     0.01s  0.11% 87.13%      0.95s 10.02%                | runtime.bgsweep
                                             0.94s 98.95% |   runtime.sweepone
----------------------------------------------------------+-------------
                                             0.08s   100% |   syscall.Syscall6
     0.01s  0.11% 87.24%      0.08s  0.84%                | runtime.entersyscall
                                             0.01s 12.50% |   runtime.systemstack
----------------------------------------------------------+-------------
                                             0.87s   100% |   runtime.systemstack
     0.01s  0.11% 87.34%      0.87s  9.18%                | runtime.gcDrain
                                             0.79s 90.80% |   runtime.scanobject
                                             0.01s  1.15% |   runtime.lock2
----------------------------------------------------------+-------------
                                             0.43s   100% |   runtime.mallocgc
     0.01s  0.11% 87.45%      0.43s  4.54%                | runtime.mProf_Malloc
                                             0.41s 95.35% |   runtime.systemstack
----------------------------------------------------------+-------------
                                             0.14s 50.00% |   github.com/pingcap/tidb/util/chunk.(*diskFormatRow).toRow
                                             0.07s 25.00% |   github.com/pingcap/tidb/util/chunk.newFixedLenColumn
                                             0.04s 14.29% |   github.com/pingcap/tidb/util/chunk.(*ListInDisk).getOffset
                                             0.02s  7.14% |   github.com/pingcap/tidb/executor.(*ExplainExec).executeAnalyzeExec
     0.01s  0.11% 87.55%      0.28s  2.95%                | runtime.newobject
                                             0.27s 96.43% |   runtime.mallocgc
----------------------------------------------------------+-------------
                                             0.81s 33.33% |   runtime.(*mheap).alloc
                                             0.41s 16.87% |   runtime.mProf_Malloc
                                             0.24s  9.88% |   runtime.(*sweepLocked).sweep
                                             0.03s  1.23% |   runtime.mallocgc
                                             0.01s  0.41% |   github.com/pingcap/tidb/executor.(*ExecStmt).next
                                             0.01s  0.41% |   runtime.entersyscall
     0.01s  0.11% 87.66%      2.43s 25.63%                | runtime.systemstack
                                             0.87s 35.80% |   runtime.gcDrain
                                             0.81s 33.33% |   runtime.(*mheap).alloc.func1
                                             0.36s 14.81% |   runtime.gentraceback
                                             0.21s  8.64% |   runtime.(*mheap).freeSpanLocked
                                             0.04s  1.65% |   runtime.futex
                                             0.04s  1.65% |   runtime.lock2
                                             0.03s  1.23% |   runtime.(*fixalloc).alloc
                                             0.01s  0.41% |   runtime.scanobject
                                             0.01s  0.41% |   runtime.unlock2
----------------------------------------------------------+-------------
                                             0.09s 75.00% |   runtime.(*mheap).allocSpan
                                             0.02s 16.67% |   runtime.(*sweepLocked).sweep
                                             0.01s  8.33% |   runtime.systemstack
     0.01s  0.11% 87.76%      0.12s  1.27%                | runtime.unlock2
                                             0.11s 91.67% |   runtime.futex
----------------------------------------------------------+-------------
                                             1.17s 98.32% |   syscall.Syscall6
     0.01s  0.11% 87.87%      1.19s 12.55%                | syscall.RawSyscall6
                                             1.18s 99.16% |   runtime/internal/syscall.Syscall6
----------------------------------------------------------+-------------
                                             0.27s   100% |   github.com/pingcap/tidb/executor.(*ExplainExec).executeAnalyzeExec
         0     0% 87.87%      0.27s  2.85%                | container/heap.Fix
                                             0.27s   100% |   container/heap.down
----------------------------------------------------------+-------------
         0     0% 87.87%      7.45s 78.59%                | github.com/pingcap/tidb/executor.(*ExecStmt).next
                                             7.44s 99.87% |   github.com/pingcap/tidb/executor.(*ExplainExec).Next
                                             0.01s  0.13% |   runtime.systemstack
----------------------------------------------------------+-------------
                                             7.44s   100% |   github.com/pingcap/tidb/executor.(*ExecStmt).next
         0     0% 87.87%      7.44s 78.48%                | github.com/pingcap/tidb/executor.(*ExplainExec).Next
                                             7.44s   100% |   github.com/pingcap/tidb/executor.(*ExplainExec).generateExplainInfo
----------------------------------------------------------+-------------
                                             7.44s   100% |   github.com/pingcap/tidb/executor.(*ExplainExec).generateExplainInfo
         0     0% 87.87%      7.44s 78.48%                | github.com/pingcap/tidb/executor.(*ExplainExec).executeAnalyzeExec
                                             4.70s 63.17% |   github.com/pingcap/tidb/util/chunk.(*diskFormatRow).toRow
                                             1.72s 23.12% |   github.com/pingcap/tidb/util/chunk.(*rowInDisk).ReadFrom
                                             0.44s  5.91% |   github.com/pingcap/tidb/util/chunk.(*ListInDisk).getOffset
                                             0.27s  3.63% |   container/heap.Fix
                                             0.23s  3.09% |   github.com/pingcap/tidb/util/chunk.(*Chunk).AppendPartialRow
                                             0.02s  0.27% |   runtime.newobject
----------------------------------------------------------+-------------
                                             7.44s   100% |   github.com/pingcap/tidb/executor.(*ExplainExec).Next
         0     0% 87.87%      7.44s 78.48%                | github.com/pingcap/tidb/executor.(*ExplainExec).generateExplainInfo
                                             7.44s   100% |   github.com/pingcap/tidb/executor.(*ExplainExec).executeAnalyzeExec