TiSpark 在写入数据后出现丢失部分数据问题

【问题描述】
使用TiSpark在做单表数据批量写入后,检查数据发现每次写入都会有几条到几十条数据的丢失,例如在一次单个任务写入4,412,698行数据时,在调用dataset的write方法并使用tidb类型的format后,实际成功写入数据库数据量为4,412,691,总计少了7条数据。
【问题自查】
起初以为是spark sql查询结果本身存在问题,经过多次验证测试,spark sql查询出的数据量是正常,没有错误,tispark在写入前的计数也是正确的,但是最终出现数据写入丢失了,我这里判断是tispark在write的时候出现问题了
【日志情况】
在排查日志过程中,发现有多处WARN日志,部分日志内容如下:

21/10/25 09:34:43 INFO TwoPhaseCommitter: start prewrite secondary key, row=61, size=16.291016KB, regionId=2352015
21/10/25 09:34:43 INFO TwoPhaseCommitter: prewrite secondary key successfully, row=61, size=16.1875KB, regionId=2352015
21/10/25 09:34:43 INFO TwoPhaseCommitter: start prewrite secondary key, row=60, size=16.198242KB, regionId=2352015
21/10/25 09:34:43 INFO TwoPhaseCommitter: prewrite secondary key successfully, row=61, size=16.291016KB, regionId=2352015
21/10/25 09:34:43 INFO TwoPhaseCommitter: start prewrite secondary key, row=61, size=16.1875KB, regionId=2352015
21/10/25 09:34:43 INFO TwoPhaseCommitter: prewrite secondary key successfully, row=60, size=16.198242KB, regionId=2352015
21/10/25 09:34:43 INFO TwoPhaseCommitter: start prewrite secondary key, row=61, size=16.114258KB, regionId=2352015
21/10/25 09:34:43 WARN KVErrorHandler: Stale Epoch encountered for region [{Region[2352015] ConfVer[7773] Version[11087] Store[3] KeyRange[t\200\000\000\000\000\0004g_r\200\000\000\000\b\370\324\250\000]:[t\200\000\000\000\000\0004g_r\200\000\000\000\t\001$\304\000]}]
21/10/25 09:34:43 WARN KVErrorHandler: Stale Epoch encountered for region [{Region[2352015] ConfVer[7773] Version[11087] Store[3] KeyRange[t\200\000\000\000\000\0004g_r\200\000\000\000\b\370\324\250\000]:[t\200\000\000\000\000\0004g_r\200\000\000\000\t\001$\304\000]}]
21/10/25 09:34:43 INFO KVErrorHandler: Accumulating cache invalidation info to driver:regionId=2352015,type=REGION_STORE
21/10/25 09:34:43 INFO TwoPhaseCommitter: prewrite secondary key fail, will backoff and retry
21/10/25 09:34:43 INFO KVErrorHandler: Accumulating cache invalidation info to driver:regionId=2352015,type=REGION_STORE
21/10/25 09:34:43 INFO TwoPhaseCommitter: prewrite secondary key fail, will backoff and retry
21/10/25 09:34:43 INFO TwoPhaseCommitter: start prewrite secondary key, row=39, size=10.25KB, regionId=2352138
21/10/25 09:34:43 INFO TwoPhaseCommitter: start prewrite secondary key, row=36, size=9.473633KB, regionId=2352138
21/10/25 09:34:43 INFO TwoPhaseCommitter: prewrite secondary key successfully, row=39, size=10.25KB, regionId=2352138
21/10/25 09:34:43 INFO TwoPhaseCommitter: start prewrite secondary key, row=22, size=5.9375KB, regionId=2352015

21/10/25 09:34:44 INFO TwoPhaseCommitter: start prewrite secondary key, row=61, size=16.206055KB, regionId=2352027
21/10/25 09:34:44 INFO TwoPhaseCommitter: prewrite secondary key successfully, row=61, size=16.206055KB, regionId=2352027
21/10/25 09:34:44 INFO TwoPhaseCommitter: start prewrite secondary key, row=60, size=16.06543KB, regionId=2352027
21/10/25 09:34:44 INFO TwoPhaseCommitter: prewrite secondary key successfully, row=60, size=16.06543KB, regionId=2352027
21/10/25 09:34:44 INFO TwoPhaseCommitter: start prewrite secondary key, row=60, size=16.158203KB, regionId=2352027
21/10/25 09:34:44 WARN KVErrorHandler: Stale Epoch encountered for region [{Region[2352027] ConfVer[7779] Version[11087] Store[12] KeyRange[t\200\000\000\000\000\0004g_r\200\000\000\000\t\t\241\351\000]:[t\200\000\000\000\000\0004g_r\200\000\000\000\t\022Z\240\000]}]
21/10/25 09:34:44 INFO KVErrorHandler: Accumulating cache invalidation info to driver:regionId=2352027,type=REGION_STORE
21/10/25 09:34:44 INFO TwoPhaseCommitter: prewrite secondary key fail, will backoff and retry
21/10/25 09:34:44 INFO TwoPhaseCommitter: start prewrite secondary key, row=27, size=7.2041016KB, regionId=2352027
21/10/25 09:34:44 INFO TwoPhaseCommitter: prewrite secondary key successfully, row=27, size=7.2041016KB, regionId=2352027
21/10/25 09:34:44 INFO TwoPhaseCommitter: start prewrite secondary key, row=33, size=8.954102KB, regionId=2352156
21/10/25 09:34:44 INFO TwoPhaseCommitter: prewrite secondary key successfully, row=33, size=8.954102KB, regionId=2352156

【TiSpark信息】
TiSaprk版本:2.4.1,spark版本:2.4.8,tidb版本:5.7.25-TiDB-v5.1.1

【代码逻辑】
代码逻辑实现使用的是java语言,具体代码如下:

dataset.write()
.format(“tidb”)
.option(“tidb.addr”, config.getByGroup(“tidb.a.addr”))
.option(“tidb.port”, config.getByGroup(“tidb.a.port”))
.option(“tidb.user”, config.getByGroup(“tidb.a.user”))
.option(“tidb.password”, config.getByGroup(“tidb.a.password”))
.option(“database”, dbName)
.option(“table”, tableName)
.mode(SaveMode.Append)
.option(“isolationLevel”,“NONE”)
.save();

这是一个已知问题,还在修复中…

1赞

好吧,看了下tidb的日志,是锁冲突问题导致的吗?针对反馈的这个问题现象目前有修复时间计划吗?

@birdstorm 老师帮忙确认一下~

1赞

是不是要放弃TiSpark了啊?看最近发布TiFlink了呢


来~上开发者论坛上讨论一下。