【TiDB 使用环境】生产环境
目前业务上表现为 当同一时刻出现业务突峰,大量类似或者相同的语句执行时,会导致某一两台tikv节点cpu夯住 从而影响其他查询。请问下各位大佬 这种情况有什么办法限制吗
优先优化sql,你这sql消耗kv磁盘资源多了吧
可以试试资源管控
这边sql单次查询毫秒 但是突峰一来就比较麻烦 会出现这种情况
执行计划有吗?发一下看看
| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk |
| Projection_4 | 4.84 | 82678083.28 | 695 | root | | time:1m3s, loops:2, Concurrency:OFF | bms.bms_weight_section.id, bms.bms_weight_section.weight_section_id, bms.bms_weight_section.tenant_code, bms.bms_weight_section.related_type, bms.bms_weight_section.temp_id, bms.bms_weight_section.price_type, bms.bms_weight_section.zone, bms.bms_weight_section.zone_price, bms.bms_weight_section.start_weight, bms.bms_weight_section.end_weight, bms.bms_weight_section.continue_weight, bms.bms_weight_section.start_volume_weight, bms.bms_weight_section.end_volume_weight | 290.5 KB | N/A |
| └─IndexLookUp_11 | 4.84 | 82678081.31 | 695 | root | | time:1m3s, loops:2, index_task: {total_time: 55.3s, fetch_handle: 16.8s, build: 108µs, wait: 38.5s}, table_task: {total_time: 12m39.9s, num: 84, concurrency: 16}, next: {wait_index: 3.04ms, wait_table_lookup_build: 441.3µs, wait_table_lookup_resp: 1m3s} | | 56.5 MB | N/A |
| ├─IndexRangeScan_8(Build) | 130101.37 | 30199221.00 | 1638645 | cop[tikv] | table:ws, index:idx_temp_id_zone(tenant_code, temp_id, zone) | time:16.7s, loops:1615, cop_task: {num: 70, max: 12.4s, min: 248.5µs, avg: 406.6ms, p95: 94.9ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 389.8ms, tot_wait: 106.7ms, copr_cache_hit_ratio: 0.59, build_task_duration: 25.2µs, max_distsql_concurrency: 5}, rpc_info:{Cop:{num_rpc:70, total_time:28.5s}}, tikv_task:{proc max:12.3s, min:0s, avg: 513.2ms, p80:218ms, p95:2.42s, iters:1877, tasks:70}, scan_detail: {total_process_keys: 627428, total_process_keys_size: 70270773, total_keys: 627458, get_snapshot_time: 62.7ms, rocksdb: {key_skipped_count: 627428, block: {cache_hit_count: 1699}}}, time_detail: {total_process_time: 389.8ms, total_suspend_time: 27.8s, total_wait_time: 106.7ms, total_kv_read_wall_time: 28.2s, tikv_wall_time: 28.4s} | range:["\x002\x002\x009\x009" 1929749903089766400,"\x002\x002\x009\x009" 1929749903089766400], ["\x002\x002\x009\x009" 1939240964738912256,"\x002\x002\x009\x009" 1939240964738912256], ["\x002\x002\x009\x009" 1939247024707088384,"\x002\x002\x009\x009" 1939247024707088384], ["\x002\x002\x009\x009" 1965080543445061632,"\x002\x002\x009\x009" 1965080543445061632], ["\x002\x002\x009\x009" 1967613809181945856,"\x002\x002\x009\x009" 1967613809181945856], keep order:false | N/A | N/A |
| └─Selection_10(Probe) | 4.84 | 63996187.94 | 695 | cop[tikv] | | time:12m39.3s, loops:87, cop_task: {num: 845, max: 12.5s, min: 631.4µs, avg: 2.11s, p95: 10.1s, max_proc_keys: 4527, p95_proc_keys: 3468, tot_proc: 20s, tot_wait: 7.92s, copr_cache_hit_ratio: 0.01, build_task_duration: 30.7ms, max_distsql_concurrency: 15, max_extra_concurrency: 1}, rpc_info:{Cop:{num_rpc:845, total_time:29m40.5s}}, tikv_task:{proc max:12.5s, min:0s, avg: 2.1s, p80:4.11s, p95:10.1s, iters:5199, tasks:845}, scan_detail: {total_process_keys: 1633698, total_process_keys_size: 274582176, total_keys: 2244987, get_snapshot_time: 6.62s, rocksdb: {key_skipped_count: 1324579, block: {cache_hit_count: 7120936}}}, time_detail: {total_process_time: 20s, total_suspend_time: 29m10.2s, total_wait_time: 7.92s, total_kv_read_wall_time: 29m29.5s, tikv_wall_time: 29m38.9s} | eq(bms.bms_weight_section.deleted, 0), eq(bms.bms_weight_section.related_type, 2), or(or(and(eq(bms.bms_weight_section.temp_id, 1965080543445061632), eq(bms.bms_weight_section.zone, "SYD")), and(eq(bms.bms_weight_section.temp_id, 1929749903089766400), eq(bms.bms_weight_section.zone, "SYD"))), or(and(eq(bms.bms_weight_section.temp_id, 1967613809181945856), eq(bms.bms_weight_section.zone, "N01")), or(and(eq(bms.bms_weight_section.temp_id, 1939247024707088384), eq(bms.bms_weight_section.zone, "N0")), and(eq(bms.bms_weight_section.temp_id, 1939240964738912256), eq(bms.bms_weight_section.zone, "N0"))))) | N/A | N/A |
| └─TableRowIDScan_9 | 130101.37 | 44520013.46 | 1638645 | cop[tikv] | table:ws | tikv_task:{proc max:12.5s, min:0s, avg: 2.1s, p80:4.11s, p95:10.1s, iters:5199, tasks:845} | keep order:false | N/A | N/A |
| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk |
| Projection_4 | 0.28 | 1700056.32 | 835 | root | | time:34.3ms, loops:2, Concurrency:OFF | bms.bms_weight_section.id, bms.bms_weight_section.weight_section_id, bms.bms_weight_section.tenant_code, bms.bms_weight_section.related_type, bms.bms_weight_section.temp_id, bms.bms_weight_section.price_type, bms.bms_weight_section.zone, bms.bms_weight_section.zone_price, bms.bms_weight_section.start_weight, bms.bms_weight_section.end_weight, bms.bms_weight_section.continue_weight, bms.bms_weight_section.start_volume_weight, bms.bms_weight_section.end_volume_weight | 315.1 KB | N/A |
| └─IndexLookUp_8 | 0.28 | 1700056.21 | 835 | root | | time:34.3ms, loops:2, index_task: {total_time: 27.6ms, fetch_handle: 27.6ms, build: 5.12µs, wait: 10.4µs}, table_task: {total_time: 27.3ms, num: 4, concurrency: 16}, next: {wait_index: 11.6ms, wait_table_lookup_build: 2.89ms, wait_table_lookup_resp: 19.4ms} | | 973.6 KB | N/A |
| ├─IndexRangeScan_5(Build) | 2675.19 | 620967.16 | 17308 | cop[tikv] | table:ws, index:idx_temp_id_zone(tenant_code, temp_id, zone) | time:26.8ms, loops:19, cop_task: {num: 7, max: 7.75ms, min: 1.34ms, avg: 3.9ms, p95: 7.75ms, max_proc_keys: 5468, p95_proc_keys: 5468, tot_proc: 14.6ms, tot_wait: 8.28ms, copr_cache_hit_ratio: 0.00, build_task_duration: 12.6µs, max_distsql_concurrency: 1}, rpc_info:{Cop:{num_rpc:7, total_time:27.2ms}}, tikv_task:{proc max:6ms, min:0s, avg: 2.14ms, p80:3ms, p95:6ms, iters:44, tasks:7}, scan_detail: {total_process_keys: 17308, total_process_keys_size: 1903880, total_keys: 17326, get_snapshot_time: 8.11ms, rocksdb: {key_skipped_count: 17308, block: {cache_hit_count: 153, read_count: 41, read_byte: 103.7 KB, read_time: 589.1µs}}}, time_detail: {total_process_time: 14.6ms, total_suspend_time: 56.9µs, total_wait_time: 8.28ms, total_kv_read_wall_time: 15ms, tikv_wall_time: 23.7ms} | range:["\x001\x003\x009\x008" 1843955671548571648,"\x001\x003\x009\x008" 1843955671548571648], ["\x001\x003\x009\x008" 1853356682233004032,"\x001\x003\x009\x008" 1853356682233004032], ["\x001\x003\x009\x008" 1853357247646035968,"\x001\x003\x009\x008" 1853357247646035968], ["\x001\x003\x009\x008" 1856596390297485312,"\x001\x003\x009\x008" 1856596390297485312], ["\x001\x003\x009\x008" 1932720982034931712,"\x001\x003\x009\x008" 1932720982034931712], ["\x001\x003\x009\x008" 1934903458821529600,"\x001\x003\x009\x008" 1934903458821529600], ["\x001\x003\x009\x008" 1934903668763156480,"\x001\x003\x009\x008" 1934903668763156480], ["\x001\x003\x009\x008" 1937037112872611840,"\x001\x003\x009\x008" 1937037112872611840], ["\x001\x003\x009\x008" 1942036188701634560,"\x001\x003\x009\x008" 1942036188701634560], ["\x001\x003\x009\x008" 1942036412329357312,"\x001\x003\x009\x008" 1942036412329357312], ["\x001\x003\x009\x008" 1944647162248523776,"\x001\x003\x009\x008" 1944647162248523776], ["\x001\x003\x009\x008" 1968612566058254336,"\x001\x003\x009\x008" 1968612566058254336], keep order:false | N/A | N/A |
| └─Selection_7(Probe) | 0.28 | 1315912.45 | 835 | cop[tikv] | | time:22.8ms, loops:6, cop_task: {num: 30, max: 6.47ms, min: 830.6µs, avg: 3.24ms, p95: 6.2ms, max_proc_keys: 2100, p95_proc_keys: 1388, tot_proc: 65.8ms, tot_wait: 10.2ms, copr_cache_hit_ratio: 0.00, build_task_duration: 281.3µs, max_distsql_concurrency: 10, max_extra_concurrency: 1}, rpc_info:{Cop:{num_rpc:30, total_time:96.8ms}}, tikv_task:{proc max:5ms, min:0s, avg: 2.3ms, p80:4ms, p95:5ms, iters:128, tasks:30}, scan_detail: {total_process_keys: 17308, total_process_keys_size: 3066996, total_keys: 19160, get_snapshot_time: 9.17ms, rocksdb: {key_skipped_count: 17404, block: {cache_hit_count: 13212, read_count: 125, read_byte: 315.0 KB, read_time: 4.19ms}}}, time_detail: {total_process_time: 65.8ms, total_suspend_time: 1.01ms, total_wait_time: 10.2ms, total_kv_read_wall_time: 49ms, tikv_wall_time: 85.2ms} | eq(bms.bms_weight_section.deleted, 0), eq(bms.bms_weight_section.related_type, 2), or(or(or(and(eq(bms.bms_weight_section.temp_id, 1934903668763156480), eq(bms.bms_weight_section.zone, "8")), or(and(eq(bms.bms_weight_section.temp_id, 1856596390297485312), eq(bms.bms_weight_section.zone, "2")), and(eq(bms.bms_weight_section.temp_id, 1942036412329357312), eq(bms.bms_weight_section.zone, "8")))), or(and(eq(bms.bms_weight_section.temp_id, 1843955671548571648), eq(bms.bms_weight_section.zone, "8")), or(and(eq(bms.bms_weight_section.temp_id, 1968612566058254336), eq(bms.bms_weight_section.zone, "8")), and(eq(bms.bms_weight_section.temp_id, 1932720982034931712), eq(bms.bms_weight_section.zone, "2"))))), or(or(and(eq(bms.bms_weight_section.temp_id, 1942036188701634560), eq(bms.bms_weight_section.zone, "8")), or(and(eq(bms.bms_weight_section.temp_id, 1934903458821529600), eq(bms.bms_weight_section.zone, "8")), and(eq(bms.bms_weight_section.temp_id, 1853356682233004032), eq(bms.bms_weight_section.zone, "2")))), or(and(eq(bms.bms_weight_section.temp_id, 1853357247646035968), eq(bms.bms_weight_section.zone, "2")), or(and(eq(bms.bms_weight_section.temp_id, 1937037112872611840), eq(bms.bms_weight_section.zone, "2")), and(eq(bms.bms_weight_section.temp_id, 1944647162248523776), eq(bms.bms_weight_section.zone, "8")))))) | N/A | N/A |
| └─TableRowIDScan_6 | 2675.19 | 915436.40 | 17308 | cop[tikv] | table:ws | tikv_task:{proc max:5ms, min:0s, avg: 1.63ms, p80:3ms, p95:5ms, iters:128, tasks:30} | keep order:false | N/A | N/A |
建议你给这个表做个统计信息更新,看起来扫描数据条数不正常
我这边操作下 随后风哥 像这种突峰流量 有什么配置 能保证不超过某个值吗
考虑是否存在热点问题,是否需要热点打散
有打散的方法吗 是有什么命令吗
调整这两个参数试试
split.qps-threshold
split.byte-threshold
//当达到阈值时,tikv会自动分裂region, 打撒热点
读写分离、热点调度优化,都可以解决你的问题。
后面马上进行了全量analyze 发现好像量大了执行还是慢的
看下资源管控
再看看执行计划estRows actRows 这2个值偏差还有这么大吗
另外你这单条sql 索引扫描百万行 资源使用肯定会多,看看业务上有什么优化空间
资源管控会影响正常的业务 能做到sql级别的吗
类似/相同的SQL。这正是使用redis的最近场景。。
tidb前面,加上一层redis缓存。。
如果是写请求。。可以在tidb前面加上MQ消峰。。