问题背景:
这个实例有一个特殊的设置 就是关闭了raft_engine,其他参数跟其他实例配置差不多
之前升级过很多版本实例,没有出现过类似情况
这次问题体现是:
升级到7.5.6之后发现很多请求全部卡住,拿sql看了一下 发现全部走了全表扫描
如下:
tikv日志报大量的 cdc相关的错误,我们确实也存在cdc服务,
问题背景:
这个实例有一个特殊的设置 就是关闭了raft_engine,其他参数跟其他实例配置差不多
之前升级过很多版本实例,没有出现过类似情况
这次问题体现是:
升级到7.5.6之后发现很多请求全部卡住,拿sql看了一下 发现全部走了全表扫描
如下:
tikv日志报大量的 cdc相关的错误,我们确实也存在cdc服务,
整个过程中,升级tikv也没有问题,看起来是升级tidb-server之后开始有问题
怀疑是tidb server没有加载到统计信息
从升级过程中的日志来看 tidb server成功加载了 schema
但是stats 看着是加载失败了
还有一个这个错误
[handle_hist.go:125] [“SyncWaitStatsLoad meets error”] [errors="["sync load took too long to return","sync load took too long to return","sync load took too long to return","sync load took too long to return","sync load took too long to return","sync load took too long to return","sync load took too long to return","sync load took too long to return","sync load took too long to return","sync load took too long to return","sync load took too long to return","sync load took too long to return","sync load took too long to return","sync load took too long to return","sync load took too long to return","sync load took too long to return","sync load took too long to return","sync load took too long to return","sync load took too long to return","sync load took too long to return","sync load stats channel is full and timeout sending task to channel","sync load stats channel is full and timeout sending task to channel","sync load stats channel is full and timeout sending task to channel","sync load took too long to return","sync load stats channel is full and timeout
实例的有几个表有分区,分区数如下:
相关stats参数如下
| information_schema_stats_expiry | 86400 |
| innodb_stats_auto_recalc | 1 |
| innodb_stats_method | nulls_equal |
| innodb_stats_on_metadata | 0 |
| innodb_stats_persistent | ON |
| innodb_stats_persistent_sample_pages | 20 |
| innodb_stats_sample_pages | 8 |
| innodb_stats_transient_sample_pages | 8 |
| myisam_stats_method | nulls_unequal |
| tidb_auto_build_stats_concurrency | 1 |
| tidb_build_sampling_stats_concurrency | 2 |
| tidb_build_stats_concurrency | 4 |
| tidb_enable_async_merge_global_stats | OFF |
| tidb_enable_extended_stats | OFF |
| tidb_enable_historical_stats | ON |
| tidb_enable_historical_stats_for_capture | OFF |
| tidb_enable_pseudo_for_outdated_stats | OFF |
| tidb_historical_stats_duration | 168h0m0s |
| tidb_merge_partition_stats_concurrency | 1 |
| tidb_plan_cache_invalidation_on_fresh_stats | OFF |
| tidb_skip_missing_partition_stats | ON |
| tidb_stats_cache_mem_quota | 0 |
| tidb_stats_load_pseudo_timeout | ON |
| tidb_stats_load_sync_wait | 100 |
升级过程中出现超时导致的
analyze后有没有修复?
analyze之后就正常了
那应该就是和那个超时报错有关,可以把其他表也手工analyze一下。
当时所有表都analyze过了 只是analyze是一个比较长的过程,这个过程中对业务还是有影响
对,这个是对业务有影响,我们当时都是在深夜执行。优先执行健康度低的大表。
c从V5.3以后 tidb_analyze_version
默认值是2 呀,这里是修改过统计信息的一些参数吗?
估计是从低版本升上来的,这个参数会保持默认