spark 3.2.1 tipsark 3.0.0 tidb 6.1.0 on k8s读取数据错误

spark 3.2.1 tipsark 3.0.0 tidb 6.1.0 on k8s读取数据错误,更换spark 3.0.3后没有错误。
spark 3.2.1 tipsark 3.0.0 tidb 6.1.0 on k8s错误表述如下:
环境:
spark 3.2.1+tipsark 3.0.0
tidb部署在k8s中,pd 3+tidb 1+tikv 3

CREATE TABLE `sbtest_t_t` (
  `id` int(11) NOT NULL,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */,
  KEY `k_1` (`k`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin

编写spark代码

        String pd_addr = "basic-pd.tidb-cluster:2379";
        String tidb_addr = "basic-tidb.tidb-cluster";

        SparkConf conf = new SparkConf()
                .set("spark.sql.extensions", "org.apache.spark.sql.TiExtensions")
                .set("spark.sql.catalog.tidb_catalog", "org.apache.spark.sql.catalyst.catalog.TiCatalog")
                .set("spark.sql.catalog.tidb_catalog.pd.addresses", pd_addr)
                .set("spark.tispark.pd.addresses", pd_addr);
        SparkSession spark = SparkSession
                .builder()
                .appName("RdbToRdbProcess")
                .config(conf)
                .getOrCreate();

            //通过 TiSpark 将 DataFrame 批量写入 TiDB
            Map<String, String> tiOptionMap = new HashMap<String, String>();
            tiOptionMap.put("tidb.addr", tidb_addr);
            tiOptionMap.put("tidb.port", "4000");
            tiOptionMap.put("tidb.user", username);
            tiOptionMap.put("tidb.password", password);
            tiOptionMap.put("replace", "true");
            tiOptionMap.put("spark.tispark.pd.addresses", pd_addr);

            spark.sql("use tidb_catalog.sbtest2");
            //获取当前时间戳
            long ttl=System.currentTimeMillis();

            spark.sql("select * from sbtest_t_t  where id = 100").show();

运行时报错:

22/06/21 05:47:39 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (172.26.2.55 executor 1): com.pingcap.tikv.exception.TiClientInternalException: Error reading region:
	at com.pingcap.tikv.operation.iterator.DAGIterator.doReadNextRegionChunks(DAGIterator.java:190)
	at com.pingcap.tikv.operation.iterator.DAGIterator.readNextRegionChunks(DAGIterator.java:167)
	at com.pingcap.tikv.operation.iterator.DAGIterator.hasNext(DAGIterator.java:113)
	at org.apache.spark.sql.execution.ColumnarRegionTaskExec$$anon$2.proceedNextBatchTask$1(CoprocessorRDD.scala:359)
	at org.apache.spark.sql.execution.ColumnarRegionTaskExec$$anon$2.hasNext(CoprocessorRDD.scala:369)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:349)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.util.concurrent.ExecutionException: com.pingcap.tikv.exception.RegionTaskException: Handle region task failed:
	at java.base/java.util.concurrent.FutureTask.report(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.get(Unknown Source)
	at com.pingcap.tikv.operation.iterator.DAGIterator.doReadNextRegionChunks(DAGIterator.java:185)
	... 23 more
Caused by: com.pingcap.tikv.exception.RegionTaskException: Handle region task failed:
	at com.pingcap.tikv.operation.iterator.DAGIterator.process(DAGIterator.java:233)
	at com.pingcap.tikv.operation.iterator.DAGIterator.lambda$submitTasks$1(DAGIterator.java:91)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	... 3 more
Caused by: com.pingcap.tikv.exception.GrpcException: Request range exceeds bound, request range:[7480000000000022FF015F728000000000FF08744A0000000000FA, 7480000000000022FF015F728000000000FF08744C0000000000FA), physical bound:[7480000000000022FF015F728000000000FF0513960000000000FB, 7480000000000022FF015F728000000000FF08744B0000000000FB)
	at com.pingcap.tikv.region.RegionStoreClient.handleCopResponse(RegionStoreClient.java:733)
	at com.pingcap.tikv.region.RegionStoreClient.coprocess(RegionStoreClient.java:680)
	at com.pingcap.tikv.operation.iterator.DAGIterator.process(DAGIterator.java:220)
	... 7 more

tidb日志:

# Time: 2022-06-21T05:50:41.645798217Z
# Txn_start_ts: 434055580189720578
# User@Host: root[root] @ 172.26.0.0 [172.26.0.0]
# Conn_ID: 4847320156352928759
# Query_time: 6.759797825
# Parse_time: 0.000074124
# Compile_time: 0.000347601
# Rewrite_time: 0.000242356
# Optimize_time: 0.000041402
# Wait_TS: 0.000357856
# Cop_time: 0.315299844 Process_time: 0.862 Wait_time: 0.003 Request_count: 3 Process_keys: 551276 Total_keys: 551279 Rocksdb_key_skipped_count: 551276 Rocksdb_block_cache_hit_count: 2021
# DB: sbtest2
# Is_internal: false
# Digest: 00f78d8dc447bf40093b4e5a2b0e92099ea1c4745b8f59e14973f4bd18e91550
# Stats: sbtest_t:pseudo
# Num_cop_tasks: 3
# Cop_proc_avg: 0.287333333 Cop_proc_p90: 0.344 Cop_proc_max: 0.344 Cop_proc_addr: basic-tikv-0.basic-tikv-peer.tidb-cluster.svc:20160
# Cop_wait_avg: 0.001 Cop_wait_p90: 0.001 Cop_wait_max: 0.001 Cop_wait_addr: basic-tikv-0.basic-tikv-peer.tidb-cluster.svc:20160
# Mem_max: 270091409
# Prepared: false
# Plan_from_cache: false
# Plan_from_binding: false
# Has_more_results: false
# KV_total: 2.798632842
# PD_total: 0.000348634
# Backoff_total: 0.002
# Write_sql_response_total: 0
# Result_rows: 0
# Succ: false
# IsExplicitTxn: false
# Plan: tidb_decode_plan('8wXweTAJMjdfMQkwCTAJTi9BCTAJdGltZTo2LjQ3cywgbG9vcHM6MSwgcHJlcGFyZTogMS4xMXMsIGluc2VydDo1LjM2cwkxMDAuNiBNQglOL0EKMQkzMV83CTAJMTAwMDAJZGF0YTpUYWJsZUZ1bGxTY2FuXzYJNDEzNTU4CWoUMzE3LjltFWx8NDA1LCBjb3BfdGFzazoge251bTogMywgbWF4OiA1ODIBKiRtaW46IDMxNS4xAQ4kYXZnOiA0MzYuNgEOCHA5NRkoUGF4X3Byb2Nfa2V5czogMjE4NTgyLAEjThcACHRvdAUXDDogODYFZwERGHdhaXQ6IDMBWgxycGNfEY4BDCUoFCAxLjMxcwWyfHJfY2FjaGVfaGl0X3JhdGlvOiAwLjAwfQkyMDAuMyBNKR8oMgk0M182CTFfMAkpIfBAdGFibGU6c2J0ZXN0X3QsIGtlZXAgb3JkZXI6ZmFsc2UsIHN0YXRzOnBzZXVkbwk1NTEyNzYJdGlrdl90YXNrOnsB4iUfBDIyJRAhHggxMjcBswhwODARFiEYDSEoaXRlcnM6NTUyLCABQmBzOjN9LCBzY2FuX2RldGFpbDoge3RvdGFsJQ4IZXNzLT8JegAsIRc6HAAoX3NpemU6IDEyMzQhYgA0ESQpdwU4oDksIHJvY2tzZGI6IHtkZWxldGVfc2tpcHBlZF9jb3VudDogMCwga2V5PhYABT4YNiwgYmxvY0EQOWQNNyAyMDIxLCByZWEuSQAFD2BieXRlOiAwIEJ5dGVzfX19CU4vQQlOL0EK')
# Plan_digest: 008eb1fb01becb5754e1b45518519660d20ae1ee6f7671d9b403ba347d5af606
/* ApplicationName=DBeaver 21.1.3 - SQLEditor <Script-176.sql> */ insert into sbtest2.sbtest_t_t select * from sbtest2.sbtest_t;

tikv日志:

[2022/06/21 05:50:04.497 +00:00] [INFO] [apply.rs:1395] ["execute admin command"] [command="cmd_type: BatchSplit splits { requests { split_key: 7480000000000022FF2300000000000000F8 new_region_id: 724041 new_peer_ids: 724042 new_peer_ids: 724043 new_peer_ids: 724044 } right_derive: true }"] [index=8] [term=6] [peer_id=724018] [region_id=724017]
[2022/06/21 05:50:04.498 +00:00] [INFO] [apply.rs:2238] ["split region"] [keys="key 7480000000000022FF2300000000000000F8"] [region="id: 724017 start_key: 7480000000000022FF015F728000000000FF0BCA210000000000FB region_epoch { conf_ver: 5 version: 33407 } peers { id: 724018 store_id: 1 } peers { id: 724019 store_id: 6001 } peers { id: 724020 store_id: 6002 }"] [peer_id=724018] [region_id=724017]
[2022/06/21 05:50:04.502 +00:00] [INFO] [peer.rs:3561] ["moving 0 locks to new regions"] [region_id=724017]
[2022/06/21 05:50:04.502 +00:00] [INFO] [peer.rs:3656] ["insert new region"] [region="id: 724041 start_key: 7480000000000022FF015F728000000000FF0BCA210000000000FB end_key: 7480000000000022FF2300000000000000F8 region_epoch { conf_ver: 5 version: 33408 } peers { id: 724042 store_id: 1 } peers { id: 724043 store_id: 6001 } peers { id: 724044 store_id: 6002 }"] [region_id=724041]
[2022/06/21 05:50:04.502 +00:00] [INFO] [peer.rs:251] ["create peer"] [peer_id=724042] [region_id=724041]
[2022/06/21 05:50:04.502 +00:00] [INFO] [raft.rs:2646] ["switched to configuration"] [config="Configuration { voters: Configuration { incoming: Configuration { voters: {724044, 724042, 724043} }, outgoing: Configuration { voters: {} } }, learners: {}, learners_next: {}, auto_leave: false }"] [raft_id=724042] [region_id=724041]
[2022/06/21 05:50:04.502 +00:00] [INFO] [raft.rs:1120] ["became follower at term 5"] [term=5] [raft_id=724042] [region_id=724041]
[2022/06/21 05:50:04.502 +00:00] [INFO] [raft.rs:384] [newRaft] [peers="Configuration { incoming: Configuration { voters: {724044, 724042, 724043} }, outgoing: Configuration { voters: {} } }"] ["last term"=5] ["last index"=5] [applied=5] [commit=5] [term=5] [raft_id=724042] [region_id=724041]
[2022/06/21 05:50:04.502 +00:00] [INFO] [raw_node.rs:315] ["RawNode created with id 724042."] [id=724042] [raft_id=724042] [region_id=724041]
[2022/06/21 05:50:04.506 +00:00] [INFO] [raft.rs:1565] ["[logterm: 5, index: 5, vote: 0] cast vote for 724044 [logterm: 5, index: 5] at term 5"] ["msg type"=MsgRequestPreVote] [term=5] [msg_index=5] [msg_term=5] [from=724044] [vote=0] [log_index=5] [log_term=5] [raft_id=724042] [region_id=724041]
[2022/06/21 05:50:04.511 +00:00] [INFO] [raft.rs:1364] ["received a message with higher term from 724044"] ["msg type"=MsgRequestVote] [message_term=6] [term=5] [from=724044] [raft_id=724042] [region_id=724041]
[2022/06/21 05:50:04.511 +00:00] [INFO] [raft.rs:1120] ["became follower at term 6"] [term=6] [raft_id=724042] [region_id=724041]
[2022/06/21 05:50:04.511 +00:00] [INFO] [raft.rs:1565] ["[logterm: 5, index: 5, vote: 0] cast vote for 724044 [logterm: 5, index: 5] at term 6"] ["msg type"=MsgRequestVote] [term=6] [msg_index=5] [msg_term=5] [from=724044] [vote=0] [log_index=5] [log_term=5] [raft_id=724042] [region_id=724041]
[2022/06/21 05:51:09.719 +00:00] [INFO] [kv.rs:1117] ["call CheckLeader failed"] [address=ipv4:172.26.2.39:52190] [err=Grpc(RemoteStopped)]
1赞

和 spark 版本有关么
3.0 没事,3.2 就会报错?

有没有试过不再 k8s 上是否有问题啊

解决了吗

本地验证了一下,非 k8s 模式,是没有问题的

你验证的是spark 3.2.1 tipsark 3.0.0 tidb 6.1.0这个组合么?如果你验证没问题,符合了我的理解,我也初步怀疑是on k8s的事,目前主要的问题是没有排查思路,而同一个环境,spark降到3.0.3就没问题,确实有点不知道怎么搞了

是稳定能在 k8s 上复现是吗?可以先在 github 提一个 issue,最好描述清楚环境

好,我提个issue,找找官方的同学

请问这个提交 issue 了吗? 链接是什么,多谢。

TiSpark 我是 3.0.1.
这个问题能稳定复现吗

在spark 3.2.1 tipsark 3.0.0 tidb 6.1.0 on k8s能稳定复现啊

是否可以帮忙尝试一下 spark 3.2.1 tipsark 3.0.1 tidb 6.1.0 on k8s 是否也有类似问题

spark 3.2.1 tipsark 3.0.1 tidb 6.1.0 on k8s没有问题了,几个特性都测试通过了。