大表索引创建失败

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】: Release Version: v3.0.4 Git Commit Hash: 694e086e7914a8fc0eb601327edb6bcc31d2c7f2 Git Branch: HEAD UTC Build Time: 2019-10-08 08:13:34 GoVersion: go version go1.12.10 linux/amd64 Race Enabled: false TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306 Check Table Before Drop: false
  • 【问题描述】: 一张含有多个分区的表,数量10亿+,对于一个varchar类型的字段创建索引时,会在执行1000秒-10000秒之间异常中断,查看tidb日志中有GC life time is shorter than transaction duration,根据文档增加了GC life time 至30m和40m均还是会失败。(表同时有查询业务和日均5000w的插入业务,是否是创建索引与插入同时进行引起的GC异常) 创建索引语句如下:ALTER TABLE TBL_KFK_VEHICLE ADD INDEX fdId (fdId); 使用admin show ddl job返回结果如下: |“1354”|“DCFlowMount”|“TBL_KFK_VEHICLE”|“add index”|“none”|“45”|“820”|“137366619”|“2019-12-27 14:57:03.612 +0800 CST”|“rollback done”| |—|---|—|---|—|---|—|---|—|---| |“1353”|“DCFlowMount”|“TBL_KFK_VEHICLE”|“add index”|“none”|“45”|“820”|“75497120”|“2019-12-27 14:12:31.212 +0800 CST”|“rollback done”| |“1352”|“DCFlowMount”|“TBL_KFK_VEHICLE”|“add index”|“none”|“45”|“820”|“80689702”|“2019-12-27 13:28:31.012 +0800 CST”|“rollback done”| |“1351”|“DCFlowMount”|“TBL_KFK_VEHICLE”|“add index”|“none”|“45”|“820”|“88952567”|“2019-12-27 11:28:29.012 +0800 CST”|“rollback done”| |“1350”|“DCFlowMount”|""|“truncate table”|“public”|“45”|“1120”|“0”|“2019-12-26 15:39:01.762 +0800 CST”|“synced”| |“1348”|“DCFlowMount”|“GlobalStringMap”|“create table”|“public”|“45”|“1347”|“0”|“2019-12-26 10:25:43.662 +0800 CST”|“synced”| |“1346”|“DCFlowMount”|“VZRK”|“create table”|“public”|“45”|“1345”|“0”|“2019-12-26 10:16:57.462 +0800 CST”|“synced”| |“1344”|“DCFlowMount”|“TTTY”|“create table”|“public”|“45”|“1343”|“0”|“2019-12-26 10:15:50.012 +0800 CST”|“synced”| |“1342”|“DCFlowMount”|“FGKA”|“create table”|“public”|“45”|“1341”|“0”|“2019-12-26 10:15:09.762 +0800 CST”|“synced”| |“1340”|“DCFlowMount”|“ZTZD”|“create table”|“public”|“45”|“1339”|“0”|“2019-12-26 10:14:40.762 +0800 CST”|“synced”|

执行Sql的TIDB节点日志截取: [2019/12/27 15:56:03.075 +08:00] [INFO] [index.go:1006] ["[ddl] split table range from PD"] [physicalTableID=824] [startHandle=8646911284672127958] [endHandle=8646911284672454320] [2019/12/27 15:56:03.076 +08:00] [INFO] [index.go:1250] ["[ddl] start add index workers to reorg index"] [workerCnt=4] [regionCnt=8] [startHandle=8646911284672127958] [endHandle=8646911284672454320] [2019/12/27 15:56:03.076 +08:00] [INFO] [index.go:943] ["[ddl] add index worker start"] [workerID=0] [2019/12/27 15:56:03.076 +08:00] [INFO] [index.go:943] ["[ddl] add index worker start"] [workerID=3] [2019/12/27 15:56:03.076 +08:00] [INFO] [index.go:943] ["[ddl] add index worker start"] [workerID=1] [2019/12/27 15:56:03.076 +08:00] [INFO] [index.go:943] ["[ddl] add index worker start"] [workerID=2] [2019/12/27 15:56:03.668 +08:00] [INFO] [index.go:937] ["[ddl] add index worker finish task"] [workerID=1] [task=“physicalTableID824_[8646911284672180910,8646911284672234710)”] [addedCount=1780] [scanCount=1780] [nextHandle=8646911284672234710] [takeTime=591.983052ms] [2019/12/27 15:56:03.719 +08:00] [INFO] [index.go:937] ["[ddl] add index worker finish task"] [workerID=2] [task=“physicalTableID824_[8646911284672234710,8646911284672267212)”] [addedCount=2412] [scanCount=2412] [nextHandle=8646911284672267212] [takeTime=643.604195ms] [2019/12/27 15:56:03.898 +08:00] [INFO] [index.go:937] ["[ddl] add index worker finish task"] [workerID=3] [task=“physicalTableID824_[8646911284672267212,8646911284672302482)”] [addedCount=2630] [scanCount=2630] [nextHandle=8646911284672302482] [takeTime=822.483855ms] [2019/12/27 15:56:03.943 +08:00] [INFO] [index.go:937] ["[ddl] add index worker finish task"] [workerID=0] [task=“physicalTableID824_[8646911284672127958,8646911284672180910)”] [addedCount=2972] [scanCount=2972] [nextHandle=8646911284672180910] [takeTime=867.370836ms] [2019/12/27 15:56:03.943 +08:00] [INFO] [index.go:1104] ["[ddl] add index worker handle batch tasks successful"] [totalAddedCount=137305161] [startHandle=8646911284672127958] [nextHandle=8646911284672302482] [batchAddedCount=9794] [takeTime=867.435409ms] [2019/12/27 15:56:03.943 +08:00] [INFO] [index.go:1006] ["[ddl] split table range from PD"] [physicalTableID=824] [startHandle=8646911284672302482] [endHandle=8646911284672454320] [2019/12/27 15:56:03.944 +08:00] [INFO] [index.go:1250] ["[ddl] start add index workers to reorg index"] [workerCnt=4] [regionCnt=4] [startHandle=8646911284672302482] [endHandle=8646911284672454320] [2019/12/27 15:56:04.308 +08:00] [INFO] [index.go:937] ["[ddl] add index worker finish task"] [workerID=2] [task=“physicalTableID824_[8646911284672411422,8646911284672432709)”] [addedCount=1397] [scanCount=1397] [nextHandle=8646911284672432709] [takeTime=364.080951ms] [2019/12/27 15:56:04.309 +08:00] [INFO] [index.go:937] ["[ddl] add index worker finish task"] [workerID=0] [task=“physicalTableID824_[8646911284672302482,8646911284672322169)”] [addedCount=1327] [scanCount=1327] [nextHandle=8646911284672322169] [takeTime=364.406309ms] [2019/12/27 15:56:04.665 +08:00] [INFO] [index.go:937] ["[ddl] add index worker finish task"] [workerID=1] [task=“physicalTableID824_[8646911284672322169,8646911284672411422)”] [addedCount=2553] [scanCount=2553] [nextHandle=8646911284672411422] [takeTime=720.583338ms] [2019/12/27 15:56:04.856 +08:00] [INFO] [index.go:937] ["[ddl] add index worker finish task"] [workerID=3] [task=“physicalTableID824_[8646911284672432709,8646911284672454320]”] [addedCount=2742] [scanCount=2742] [nextHandle=8646911284672454321] [takeTime=912.082483ms] [2019/12/27 15:56:04.856 +08:00] [INFO] [index.go:1104] ["[ddl] add index worker handle batch tasks successful"] [totalAddedCount=137313180] [startHandle=8646911284672302482] [nextHandle=8646911284672454321] [batchAddedCount=8019] [takeTime=912.164363ms] [2019/12/27 15:56:04.856 +08:00] [INFO] [index.go:973] ["[ddl] add index worker exit"] [workerID=1] [2019/12/27 15:56:04.856 +08:00] [INFO] [index.go:973] ["[ddl] add index worker exit"] [workerID=0] [2019/12/27 15:56:04.856 +08:00] [INFO] [index.go:973] ["[ddl] add index worker exit"] [workerID=2] [2019/12/27 15:56:04.856 +08:00] [INFO] [index.go:973] ["[ddl] add index worker exit"] [workerID=3] [2019/12/27 15:56:04.859 +08:00] [INFO] [reorg.go:140] ["[ddl] run reorg job done"] [“handled rows”=137313180] [2019/12/27 15:56:04.859 +08:00] [ERROR] [ddl_worker.go:570] ["[ddl] run DDL job error"] [worker=“worker 4, tp add index”] [error="[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2019-12-27 14:57:03.762 +0800 CST, GC safe point is 2019-12-27 15:44:51.362 +0800 CST"] [errorVerbose="[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2019-12-27 14:57:03.762 +0800 CST, GC safe point is 2019-12-27 15:44:51.362 +0800 ] [2019/12/27 15:56:04.862 +08:00] [INFO] [ddl_worker.go:430] ["[ddl] run DDL job error, sleeps a while then retries it."] [worker=“worker 4, tp add index”] [waitTime=1s] [error="[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2019-12-27 14:57:03.762 +0800 CST, GC safe point is 2019-12-27 15:44:51.362 +0800 CST"] [errorVerbose="[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2019-12-27 14:57:03.762 +0800 CST, GC safe point is 2019-12-27 15:44:51.362 +0800 [2019/12/27 15:56:05.862 +08:00] [INFO] [ddl_worker.go:624] ["[ddl] schema version doesn’t change"] [worker=“worker 4, tp add index”] [2019/12/27 15:56:05.863 +08:00] [INFO] [ddl_worker.go:486] ["[ddl] run DDL job"] [worker=“worker 4, tp add index”] [job=“ID:1354, Type:add index, State:running, SchemaState:write reorganization, SchemaID:45, TableID:820, RowCount:137313180, ArgLen:0, start time: 2019-12-27 14:57:03.612 +0800 CST, Err:[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2019-12-27 14:57:03.762 +0800 CST, GC safe point is 2019-12-27 15:44:51.362 +0800 CST, ErrCount:510, SnapshotVersion:413513763720265735”] [2019/12/27 15:56:05.864 +08:00] [WARN] [partition.go:177] [“partition pruning not applicable”] [expression=unix_timestamp(_dc_time)] [2019/12/27 15:56:05.866 +08:00] [INFO] [index.go:1187] ["[ddl] start to add table index"] [job=“ID:1354, Type:add index, State:running, SchemaState:write reorganization, SchemaID:45, TableID:820, RowCount:137313180, ArgLen:4, start time: 2019-12-27 14:57:03.612 +0800 CST, Err:[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2019-12-27 14:57:03.762 +0800 CST, GC safe point is 2019-12-27 15:44:51.362 +0800 CST, ErrCount:510, SnapshotVersion:413513763720265735”] [reorgInfo=StartHandle:8646911284672127958,EndHandle:8646911284672454320,first:false,PhysicalTableID:824] [2019/12/27 15:56:05.866 +08:00] [INFO] [index.go:1006] ["[ddl] split table range from PD"] [physicalTableID=824] [startHandle=8646911284672127958] [endHandle=8646911284672454320] [2019/12/27 15:56:05.867 +08:00] [INFO] [index.go:1250] ["[ddl] start add index workers to reorg index"] [workerCnt=4] [regionCnt=8] [startHandle=8646911284672127958] [endHandle=8646911284672454320] [2019/12/27 15:56:05.867 +08:00] [INFO] [index.go:943] ["[ddl] add index worker start"] [workerID=3] [2019/12/27 15:56:05.867 +08:00] [INFO] [index.go:943] ["[ddl] add index worker start"] [workerID=0] [2019/12/27 15:56:05.867 +08:00] [INFO] [index.go:943] ["[ddl] add index worker start"] [workerID=2] [2019/12/27 15:56:05.867 +08:00] [INFO] [index.go:943] ["[ddl] add index worker start"] [workerID=1] [2019/12/27 15:56:06.459 +08:00] [INFO] [index.go:937] ["[ddl] add index worker finish task"] [workerID=1] [task=“physicalTableID824_[8646911284672180910,8646911284672234710)”] [addedCount=1780] [scanCount=1780] [nextHandle=8646911284672234710] [takeTime=591.757283ms] [2019/12/27 15:56:06.507 +08:00] [INFO] [index.go:937] ["[ddl] add index worker finish task"] [workerID=2] [task=“physicalTableID824_[8646911284672234710,8646911284672267212)”] [addedCount=2412] [scanCount=2412] [nextHandle=8646911284672267212] [takeTime=639.263509ms] [2019/12/27 15:56:06.743 +08:00] [INFO] [index.go:937] ["[ddl] add index worker finish task"] [workerID=0] [task=“physicalTableID824_[8646911284672127958,8646911284672180910)”] [addedCount=2972] [scanCount=2972] [nextHandle=8646911284672180910] [takeTime=875.922861ms] [2019/12/27 15:56:06.773 +08:00] [INFO] [index.go:937] ["[ddl] add index worker finish task"] [workerID=3] [task=“physicalTableID824_[8646911284672267212,8646911284672302482)”] [addedCount=2630] [scanCount=2630] [nextHandle=8646911284672302482] [takeTime=905.199977ms] [2019/12/27 15:56:06.773 +08:00] [INFO] [index.go:1104] ["[ddl] add index worker handle batch tasks successful"] [totalAddedCount=137322974] [startHandle=8646911284672127958] [nextHandle=8646911284672302482] [batchAddedCount=9794] [takeTime=905.253225ms] [2019/12/27 15:56:06.773 +08:00] [INFO] [index.go:1006] ["[ddl] split table range from PD"] [physicalTableID=824] [startHandle=8646911284672302482] [endHandle=8646911284672454320] [2019/12/27 15:56:06.774 +08:00] [INFO] [index.go:1250] ["[ddl] start add index workers to reorg index"] [workerCnt=4] [regionCnt=4] [startHandle=8646911284672302482] [endHandle=8646911284672454320] [2019/12/27 15:56:07.145 +08:00] [INFO] [index.go:937] ["[ddl] add index worker finish task"] [workerID=0] [task=“physicalTableID824_[8646911284672302482,8646911284672322169)”] [addedCount=1327] [scanCount=1327] [nextHandle=8646911284672322169] [takeTime=370.847143ms] [2019/12/27 15:56:07.156 +08:00] [INFO] [index.go:937] ["[ddl] add index worker finish task"] [workerID=2] [task=“physicalTableID824_[8646911284672411422,8646911284672432709)”] [addedCount=1397] [scanCount=1397] [nextHandle=8646911284672432709] [takeTime=382.462522ms] [2019/12/27 15:56:07.499 +08:00] [INFO] [index.go:937] ["[ddl] add index worker finish task"] [workerID=1] [task=“physicalTableID824_[8646911284672322169,8646911284672411422)”] [addedCount=2553] [scanCount=2553] [nextHandle=8646911284672411422] [takeTime=725.624938ms] [2019/12/27 15:56:07.687 +08:00] [INFO] [index.go:937] ["[ddl] add index worker finish task"] [workerID=3] [task=“physicalTableID824_[8646911284672432709,8646911284672454320]”] [addedCount=2742] [scanCount=2742] [nextHandle=8646911284672454321] [takeTime=912.941271ms] [2019/12/27 15:56:07.687 +08:00] [INFO] [index.go:1104] ["[ddl] add index worker handle batch tasks successful"] [totalAddedCount=137330993] [startHandle=8646911284672302482] [nextHandle=8646911284672454321] [batchAddedCount=8019] [takeTime=913.049531ms] [2019/12/27 15:56:07.687 +08:00] [INFO] [index.go:973] ["[ddl] add index worker exit"] [workerID=1] [2019/12/27 15:56:07.687 +08:00] [INFO] [index.go:973] ["[ddl] add index worker exit"] [workerID=0] [2019/12/27 15:56:07.687 +08:00] [INFO] [index.go:973] ["[ddl] add index worker exit"] [workerID=2] [2019/12/27 15:56:07.687 +08:00] [INFO] [index.go:973] ["[ddl] add index worker exit"] [workerID=3] [2019/12/27 15:56:07.688 +08:00] [INFO] [reorg.go:140] ["[ddl] run reorg job done"] [“handled rows”=137330993] [2019/12/27 15:56:07.688 +08:00] [ERROR] [ddl_worker.go:570] ["[ddl] run DDL job error"] [worker=“worker 4, tp add index”] [error="[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2019-12-27 14:57:03.762 +0800 CST, GC safe point is 2019-12-27 15:44:51.362 +0800 CST"] [errorVerbose="[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2019-12-27 14:57:03.762 +0800 CST, GC safe point is 2019-12-27 15:44:51.362 +0800 ] [2019/12/27 15:56:07.692 +08:00] [INFO] [ddl_worker.go:430] ["[ddl] run DDL job error, sleeps a while then retries it."] [worker=“worker 4, tp add index”] [waitTime=1s] [error="[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2019-12-27 14:57:03.762 +0800 CST, GC safe point is 2019-12-27 15:44:51.362 +0800 CST"] [errorVerbose="[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2019-12-27 14:57:03.762 +0800 CST, GC safe point is 2019-12-27 15:44:51.362 +0800 CST\ngithub.com/pingcap/errors.AddStack\n\t/home/jenkins/workspace/release_tidb_3.0/go/pkg/mod/github.com/pingcap/errors@v0.11.4/errors.go:174\ngithub.com/pingcap/parser/terror.(*Error).GenWithStackByArgs\n\t/home/jenkins/workspace/release_tidb_3.0/go/pkg/mod/github.com/pingcap/parser@v0.0.0-20190910041007-2a177b291004/terror/terror.go:238\ngithub.com/pingcap/tidb/store/tikv.(*tikvStore).CheckVisibility\n\t/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/store/tikv/kv.go:171\ngithub.com/pingcap/tidb/store/tikv.(*Scanner).getData\n\t/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/store/tikv/scan.go:229\ngithub.com/pingcap/tidb/store/tikv.(*Scanner).Next\n\t/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/store/tikv/scan.go:100\ngithub.com/pingcap/tidb/store/tikv.newScanner\n\t/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/store/tikv/scan.go:58\ngithub.com/pingcap/tidb/store/tikv.(*tikvSnapshot).Iter\n\t/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:337\ngithub.com/pingcap/tidb/ddl.iterateSnapshotRows\n\t/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/ddl/index.go:1371\ngithub.com/pingcap/tidb/ddl.getTableRange\n\t/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/ddl/reorg.go:321\ngithub.com/pingcap/tidb/ddl.(*worker).updateReorgInfo\n\t/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/ddl/index.go:1312\ngithub.com/pingcap/tidb/ddl.(*worker).addTableIndex\n\t/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/ddl/index.go:1281\ngithub.com/pingcap/tidb/ddl.(*worker).onCreateIndex.func1\n\t/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/ddl/index.go:361\ngithub.com/pingcap/tidb/ddl.(*worker).runReorgJob.func1\n\t/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/tidb/ddl/reorg.go:122\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1337"] [2019/12/27 15:56:08.692 +08:00] [INFO] [ddl_worker.go:624] ["[ddl] schema version doesn’t change"] [worker=“worker 4, tp add index”] [2019/12/27 15:56:08.693 +08:00] [INFO] [ddl_worker.go:486] ["[ddl] run DDL job"] [worker=“worker 4, tp add index”] [job=“ID:1354, Type:add index, State:running, SchemaState:write reorganization, SchemaID:45, TableID:820, RowCount:137330993, ArgLen:0, start time: 2019-12-27 14:57:03.612 +0800 CST, Err:[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2019-12-27 14:57:03.762 +0800 CST, GC safe point is 2019-12-27 15:44:51.362 +0800 CST, ErrCount:511, SnapshotVersion:413513763720265735”] [2019/12/27 15:56:08.695 +08:00] [WARN] [partition.go:177] [“partition pruning not applicable”] [expression=unix_timestamp(_dc_time)] [2019/12/27 15:56:08.697 +08:00] [INFO] [index.go:1187] ["[ddl] start to add table index"] [job=“ID:1354, Type:add index, State:running, SchemaState:write reorganization, SchemaID:45, TableID:820, RowCount:137330993, ArgLen:4, start time: 2019-12-27 14:57:03.612 +0800 CST, Err:[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2019-12-27 14:57:03.762 +0800 CST, GC safe point is 2019-12-27 15:44:51.362 +0800 CST, ErrCount:511, SnapshotVersion:413513763720265735”] [reorgInfo=StartHandle:8646911284672127958,EndHandle:8646911284672454320,first:false,PhysicalTableID:824] [2019/12/27 15:56:08.697 +08:00] [INFO] [index.go:1006] ["[ddl] split table range from PD"] [physicalTableID=824] [startHandle=8646911284672127958] [endHandle=8646911284672454320] [2019/12/27 15:56:08.698 +08:00] [INFO] [index.go:1250] ["[ddl] start add index workers to reorg index"] [workerCnt=4] [regionCnt=8] [startHandle=8646911284672127958] [endHandle=8646911284672454320] [2019/12/27 15:56:08.698 +08:00] [INFO] [index.go:943] ["[ddl] add index worker start"] [workerID=1] [2019/12/27 15:56:08.698 +08:00] [INFO] [index.go:943] ["[ddl] add index worker start"] [workerID=0] [2019/12/27 15:56:08.698 +08:00] [INFO] [index.go:943] ["[ddl] add index worker start"] [workerID=2] [2019/12/27 15:56:08.698 +08:00] [INFO] [index.go:943] ["[ddl] add index worker start"] [workerID=3] [2019/12/27 15:56:09.280 +08:00] [INFO] [index.go:937] ["[ddl] add index worker finish task"] [workerID=1] [task=“physicalTableID824_[8646911284672180910,8646911284672234710)”] [addedCount=1780] [scanCount=1780] [nextHandle=8646911284672234710] [takeTime=582.257015ms] [2019/12/27 15:56:09.349 +08:00] [INFO] [index.go:937] ["[ddl] add index worker finish task"] [workerID=2] [task=“physicalTableID824_[8646911284672234710,8646911284672267212)”] [addedCount=2412] [scanCount=2412] [nextHandle=8646911284672267212] [takeTime=651.071638ms] [2019/12/27 15:56:09.556 +08:00] [INFO] [index.go:937] ["[ddl] add index worker finish task"] [workerID=3] [task=“physicalTableID824_[8646911284672267212,8646911284672302482)”] [addedCount=2630] [scanCount=2630] [nextHandle=8646911284672302482] [takeTime=857.79201ms] [2019/12/27 15:56:09.565 +08:00] [INFO] [index.go:937] ["[ddl] add index worker finish task"] [workerID=0] [task=“physicalTableID824_[8646911284672127958,8646911284672180910)”] [addedCount=2972] [scanCount=2972] [nextHandle=8646911284672180910] [takeTime=867.474141ms] [2019/12/27 15:56:09.565 +08:00] [INFO] [index.go:1104] ["[ddl] add index worker handle batch tasks successful"] [totalAddedCount=137340787] [startHandle=8646911284672127958] [nextHandle=8646911284672302482] [batchAddedCount=9794] [takeTime=867.539242ms] [2019/12/27 15:56:09.565 +08:00] [INFO] [index.go:1006] ["[ddl] split table range from PD"] [physicalTableID=824] [startHandle=8646911284672302482] [endHandle=8646911284672454320] [2019/12/27 15:56:09.566 +08:00] [INFO] [index.go:1250] ["[ddl] start add index workers to reorg index"] [workerCnt=4] [regionCnt=4] [startHandle=8646911284672302482] [endHandle=8646911284672454320] [2019/12/27 15:56:09.917 +08:00] [INFO] [index.go:937] ["[ddl] add index worker finish task"] [workerID=0] [task=“physicalTableID824_[8646911284672302482,8646911284672322169)”] [addedCount=1327] [scanCount=1327] [nextHandle=8646911284672322169] [takeTime=350.358655ms] [2019/12/27 15:56:09.942 +08:00] [INFO] [index.go:937] ["[ddl] add index worker finish task"] [workerID=2] [task=“physicalTableID824_[8646911284672411422,8646911284672432709)”] [addedCount=1397] [scanCount=1397] [nextHandle=8646911284672432709] [takeTime=375.156246ms] [2019/12/27 15:56:10.304 +08:00] [INFO] [index.go:937] ["[ddl] add index worker finish task"] [workerID=1] [task=“physicalTableID824_[8646911284672322169,8646911284672411422)”] [addedCount=2553] [scanCount=2553] [nextHandle=8646911284672411422] [takeTime=737.857937ms] [2019/12/27 15:56:10.470 +08:00] [INFO] [index.go:937] ["[ddl] add index worker finish task"] [workerID=3] [task=“physicalTableID824_[8646911284672432709,8646911284672454320]”] [addedCount=2742] [scanCount=2742] [nextHandle=8646911284672454321] [takeTime=903.510656ms] [2019/12/27 15:56:10.470 +08:00] [INFO] [index.go:1104] ["[ddl] add index worker handle batch tasks successful"] [totalAddedCount=137348806] [startHandle=8646911284672302482] [nextHandle=8646911284672454321] [batchAddedCount=8019] [takeTime=903.562817ms] [2019/12/27 15:56:10.470 +08:00] [INFO] [index.go:973] ["[ddl] add index worker exit"] [workerID=1] [2019/12/27 15:56:10.470 +08:00] [INFO] [index.go:973] ["[ddl] add index worker exit"] [workerID=0] [2019/12/27 15:56:10.470 +08:00] [INFO] [index.go:973] ["[ddl] add index worker exit"] [workerID=3] [2019/12/27 15:56:10.470 +08:00] [INFO] [index.go:973] ["[ddl] add index worker exit"] [workerID=2] [2019/12/27 15:56:10.472 +08:00] [INFO] [reorg.go:140] ["[ddl] run reorg job done"] [“handled rows”=137348806] [2019/12/27 15:56:10.472 +08:00] [ERROR] [ddl_worker.go:570] ["[ddl] run DDL job error"] [worker=“worker 4, tp add index”] [error="[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2019-12-27 14:57:03.762 +0800 CST, GC safe point is 2019-12-27 15:44:51.362 +0800 CST"] [errorVerbose="[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2019-12-27 14:57:03.762 +0800 CST, GC safe point is 2019-12-27 15:44:51.362 +0800 ] [2019/12/27 15:56:11.475 +08:00] [INFO] [ddl_worker.go:624] ["[ddl] schema version doesn’t change"] [worker=“worker 4, tp add index”] [2019/12/27 15:56:11.476 +08:00] [INFO] [ddl_worker.go:486] ["[ddl] run DDL job"] [worker=“worker 4, tp add index”] [job=“ID:1354, Type:add index, State:running, SchemaState:write reorganization, SchemaID:45, TableID:820, RowCount:137348806, ArgLen:0, start time: 2019-12-27 14:57:03.612 +0800 CST, Err:[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2019-12-27 14:57:03.762 +0800 CST, GC safe point is 2019-12-27 15:44:51.362 +0800 CST, ErrCount:512, SnapshotVersion:413513763720265735”] [2019/12/27 15:56:11.478 +08:00] [WARN] [partition.go:177] [“partition pruning not applicable”] [expression=unix_timestamp(_dc_time)] [2019/12/27 15:56:11.480 +08:00] [INFO] [index.go:1187] ["[ddl] start to add table index"] [job=“ID:1354, Type:add index, State:running, SchemaState:write reorganization, SchemaID:45, TableID:820, RowCount:137348806, ArgLen:4, start time: 2019-12-27 14:57:03.612 +0800 CST, Err:[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2019-12-27 14:57:03.762 +0800 CST, GC safe point is 2019-12-27 15:44:51.362 +0800 CST, ErrCount:512, SnapshotVersion:413513763720265735”] [reorgInfo=StartHandle:8646911284672127958,EndHandle:8646911284672454320,first:false,PhysicalTableID:824] [2019/12/27 15:56:11.480 +08:00] [INFO] [index.go:1006] ["[ddl] split table range from PD"] [physicalTableID=824] [startHandle=8646911284672127958] [endHandle=8646911284672454320] [2019/12/27 15:56:11.481 +08:00] [INFO] [index.go:943] ["[ddl] add index worker start"] [workerID=0] [2019/12/27 15:56:11.481 +08:00] [INFO] [index.go:943] ["[ddl] add index worker start"] [workerID=3] [2019/12/27 15:56:11.481 +08:00] [INFO] [index.go:1250] ["[ddl] start add index workers to reorg index"] [workerCnt=4] [regionCnt=8] [startHandle=8646911284672127958] [endHandle=8646911284672454320]

[2019/12/27 15:56:11.480 +08:00] [INFO] [index.go:1187] ["[ddl] start to add table index"] [job=“ID:1354, Type:add index, State:running, SchemaState:write reorganization, SchemaID:45, TableID:820, RowCount:137348806, ArgLen:4, start time: 2019-12-27 14:57:03.612 +0800 CST, Err:[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2019-12-27 14:57:03.762 +0800 CST, GC safe point is 2019-12-27 15:44:51.362 +0800 CST, ErrCount:512, SnapshotVersion:413513763720265735”]

从这两个时间可以看到添加索引执行时间超过 40 分钟,还需要调长 GC 时间。

@wiboson 在 3.0.4 上,对于大的分区表我们 add index 会可能会出现这种情况。需要如 @Damon-PingCAP 所说调长 tikv_gc_life_time 值。如果 tikv_gc_life_time 对此集群影响不大,以防万一建议先将其设置为 10h 吧。添加完索引之后再改回来。 这个需要调整 tikv_gc_life_time 值解决此问题,已经在我们 master 和接下来发布的 3.0.8 上修复。