反复出现 analyze worker panicked 报错

tidb节点的日志,一直报如下错误,有2台重启后,该报错消失, 另一台则反复出现,请问这个问题怎么处理,因为是生产环境,所以不能试,谢谢。

[2021/06/29 12:40:26.054 +08:00] [ERROR] [analyze.go:172] [“analyze worker panicked”] [stack=“goroutine 4804628 [running]:\ngithub.com/pingcap/tidb/executor.(*AnalyzeExec).analyzeWorker.func1(0xc00e4eac60, 0xc0019cded8, 0xc00fcb5b00, 0x1)\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/analyze.go:170 +0xff\ panic(0x3169660, 0xc0115fc480)\ \t/usr/local/go/src/runtime/panic.go:679 +0x1b2\ngithub.com/pingcap/tidb/util/collate.decodeRune(0xc00c737752, 0x8, 0x7, 0x6ce, 0x7)\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/util/collate/unicode_ci.go:53 +0x111\ngithub.com/pingcap/tidb/util/collate.(*generalCICollator).Compare(0x54fb878, 0xc00c737752, 0x8, 0xc00c737672, 0x8, 0xffffffffffffffff)\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/util/collate/general_ci.go:39 +0xdb\ngithub.com/pingcap/tidb/types.CompareString(0xc00c737752, 0x8, 0xc00c737672, 0x8, 0xc001c7cc80, 0x12, 0xffffffffffffffff)\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/types/compare.go:118 +0x71\ngithub.com/pingcap/tidb/types.(*Datum).compareString(0xc009c099e0, 0xc0116ea900, 0xc00c737672, 0x8, 0xc001c7cc80, 0x12, 0xffffffffffffffff, 0x0, 0x0)\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/types/datum.go:665 +0x112\ngithub.com/pingcap/tidb/types.(*Datum).CompareDatum(0xc009c099e0, 0xc0116ea900, 0xc009c090e0, 0xffffffffffffffff, 0x0, 0x0)\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/types/datum.go:565 +0x28c\ngithub.com/pingcap/tidb/statistics.(*sampleItemSorter).Less(0xc00911a150, 0x16, 0x15, 0x1)\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/statistics/sample.go:67 +0x68\ sort.insertionSort(0x38b0f00, 0xc00911a150, 0x14, 0x28)\ \t/usr/local/go/src/sort/sort.go:27 +0xc4\ sort.stable(0x38b0f00, 0xc00911a150, 0x3e8)\ \t/usr/local/go/src/sort/sort.go:364 +0x51\ sort.Stable(0x38b0f00, 0xc00911a150)\ \t/usr/local/go/src/sort/sort.go:357 +0x53\ngithub.com/pingcap/tidb/statistics.SortSampleItems(...)\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/statistics/sample.go:51\ github.com/pingcap/tidb/statistics.BuildColumnHist(0x39090e0, 0xc0021907e0, 0x100, 0x3, 0xc00e4eb140, 0xc000e4a6e8, 0x3f9, 0x182, 0x0, 0x4, …)\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/statistics/builder.go:113 +0x1b8\ngithub.com/pingcap/tidb/statistics.BuildColumn(0x39090e0, 0xc0021907e0, 0x100, 0x3, 0xc00e4eb140, 0xc000e4a6e8, 0xc001c7cc80, 0x12, 0x0)\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/statistics/builder.go:185 +0xa3\ngithub.com/pingcap/tidb/executor.(*AnalyzeColumnsExec).buildStats(0xc0057780e0, 0xc00ac1e010, 0x1, 0x1, 0xc007ffa1f0, 0x2, 0x2, 0xc007ffa200, 0x2, 0x2, …)\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/analyze.go:527 +0xa5f\ngithub.com/pingcap/tidb/executor.analyzeColumnsPushdown(0xc0057780e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, …)\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/analyze.go:379 +0xab\ngithub.com/pingcap/tidb/executor.(*AnalyzeExec).analyzeWorker(0xc00fcb5b00, 0xc00e4eac00, 0xc00e4eac60, 0x1)\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/analyze.go:195 +0x1ce\ created by github.com/pingcap/tidb/executor.(*AnalyzeExec).Next\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/executor/analyze.go:87 +0x147\ ”]
[2021/06/29 12:40:26.054 +08:00] [INFO] [tidb.go:219] [“rollbackTxn for ddl/autocommit failed”]
[2021/06/29 12:40:26.054 +08:00] [WARN] [session.go:1383] [“run statement failed”] [schemaVersion=6409] [error=“analyze worker panic”] [session="{\ “currDBName”: “”,\ “id”: 0,\ “status”: 2,\ “strictMode”: true,\ “user”: null\ }"]
[2021/06/29 12:40:26.054 +08:00] [ERROR] [update.go:796] ["[stats] auto analyze failed"] [sql=“analyze table %n.%n”] [cost_time=16.335944ms] [error=“analyze worker panic”]

是哪个版本的 tidb 呢 ? 另外确认下 tidb_enable_fast_analyze 参数值是怎样的?
https://github.com/pingcap/tidb/issues/15751

在 TiDB 中,默认情况会有一个 TiDB Server 作为 analyze owner,自动收集表的统计信息,作为 CBO 模型的基础数据,所以,重启一个 TiDB Server 不报错了,但是另一个 TiDB Server 会报错的原因,上面看起来是 auto analyze 出现了报错。

1、请提供下当前使用的 TiDB 的版本,tiup cluster display {cluster_name} 或者使用 select version() 查看
2、查询下当前自动收集统计信息的时间窗口 show variables like ‘%analyze%’

Cluster version: v4.0.12

tidb_auto_analyze_end_time 23:59 +0000
tidb_auto_analyze_ratio 0.5
tidb_auto_analyze_start_time 00:00 +0000
tidb_enable_fast_analyze 0

辛苦确认下 ,是否使用了新的排序规则 new_collations_enabled_on_first_bootstrap ,在集群初次初始化时决定是否启用新排序规则框架。在该配置开关打开之后初始化集群,可以通过 mysql . tidb 表中的 new_collation_enabled 变量确认是否启用新排序规则框架:

SELECT VARIABLE_VALUE FROM mysql.tidb WHERE VARIABLE_NAME='new_collation_enabled';

SELECT VARIABLE_VALUE FROM mysql.tidb WHERE VARIABLE_NAME=‘new_collation_enabled’; 返回的结果是:True

在 v4.0.13 版本修复了一个 enable_new_collation 开启时, ANALYZE 出错的问题的 bug,详情如下:

https://github.com/pingcap/tidb/issues/20874

您的意思是 建议对生产进行升级是吗? 没有别的方法了么?

您那里可以手动 analyze 一个小表再次确认下这个问题。

当前没有更好的跳过的方式,如果将 auto analyze 自动收集统计信息的功能关闭,那么集群中的统计信息会处于『静态』,对 SQL 执行计划的选择可能会带来『负面』的影响。

您那里可以在测试环境升级到 v4.0.13 并验证下能否解决 analyze worker panic 的问题,如手动的 analyze 一个 table,观察下 ~

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。