tikv-import和tidb-lightining二进制启动,导数据进k8s里的tidb实例,失败,日志报错

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:v4.0.2
  • 【问题描述】:在k8s的node节点,用import和lightining二进制文件导数据进k8s集群中的tidb实例,失败。具体配置及错误日志如下:
    tikv-import.toml 文件
# TiKV Importer 配置文件模版
# 日志文件。
log-file = "tikv-importer.log"
# 日志等级:trace、debug、info、warn、error、off。
log-level = "info"

[server]
# tikv-importer 监听的地址,tidb-lightning 需要连到这个地址进行数据写入。
addr = "10.226.132.106:8287"

[import]
# 存储引擎文档 (engine file) 的文件夹路径。
import-dir = "/tmp/ssd/data.import/"
 

tidb-lightining.toml 文件


[lightning]

# 转换数据的并发数,默认为逻辑 CPU 数量,不需要配置。
# 混合部署的情况下可以配置为逻辑 CPU 的 75% 大小。
# region-concurrency =

# 日志
level = "info"
file = "tidb-lightning.log"
#server-mode = true
#status-addr = ':8289'

[tikv-importer]
# tikv-importer 的监听地址,需改成 tikv-importer 服务器的实际地址。
addr = "10.226.132.106:8287"

[mydumper]
# Mydumper 源数据目录。
data-source-dir = "/tmp/test"

[tidb]
# 目标集群的信息。tidb-server 的监听地址,填一个即可。
host = "10.233.7.89"
port = 4000
user = "root"
password = "Testtest1"
# 表架构信息在从 TiDB 的“状态端口”获取。
status-port = 10080
pd-addr = "10.233.88.246:2379"

tidb-lightining.log 错误日志如下:

[2020/07/28 20:31:21.594 +08:00] [INFO] [restore.go:513] [progress] [files="0/32 (0.0%)"] [tables="0/32 (0.0%)"] [speed(MiB/s)=0.00006292870286067323] [state=writing] []
[2020/07/28 20:36:21.592 +08:00] [INFO] [tikv.go:148] ["switch mode failed"] [mode=Import] [tikv=demo-tikv-2.demo-tikv-peer.liuxiao.svc:20160] [takeTime=2.753801ms] [error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp: lookup demo-tikv-2.demo-tikv-peer.liuxiao.svc on 10.226.146.245:53: no such host\""]
[2020/07/28 20:36:21.592 +08:00] [INFO] [tikv.go:148] ["switch mode failed"] [mode=Import] [tikv=demo-tikv-1.demo-tikv-peer.liuxiao.svc:20160] [takeTime=2.858012ms] [error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp: lookup demo-tikv-1.demo-tikv-peer.liuxiao.svc on 10.226.146.245:53: no such host\""]

tikv-importer.log 的错误日志如下

[2020/07/28 20:40:41.064 +08:00] [INFO] [util.rs:403] ["connecting to PD endpoint"] [endpoints=http://demo-pd-0.demo-pd-peer.liuxiao.svc:2379]
[2020/07/28 20:40:41.065 +08:00] [ERROR] [util.rs:450] ["connect failed"] [err="Grpc(RpcFailure(RpcStatus { status: 14-UNAVAILABLE, details: Some(\"DNS resolution failed\") }))"] [endpoints=http://demo-pd-0.demo-pd-peer.liuxiao.svc:2379]

tidb实例 的pod和service信息如下:

]# kubectl get pod -n liuxiao  -o wide
NAME                                   READY   STATUS      RESTARTS   AGE    IP               NODE             NOMINATED NODE
demo-discovery-5db6b78dcc-lb47z        1/1     Running     0          5d6h   10.233.95.77     10.226.132.106   <none>
demo-importer-0                        2/2     Running     0          4d5h   10.233.98.84     10.226.132.107   <none>
demo-monitor-659b74b967-m8nzd          3/3     Running     0          4d4h   10.233.121.168   10.226.132.104   <none>
demo-pd-0                              1/1     Running     0          5d3h   10.233.95.72     10.226.132.106   <none>
demo-pd-1                              1/1     Running     0          4d4h   10.233.121.154   10.226.132.104   <none>
demo-pd-2                              1/1     Running     1          4d4h   10.233.88.246    10.226.132.105   <none>
demo-tidb-0                            2/2     Running     0          4d4h   10.233.88.216    10.226.132.105   <none>
demo-tidb-1                            2/2     Running     0          5d6h   10.233.95.102    10.226.132.106   <none>
demo-tikv-0                            1/1     Running     0          4d4h   10.233.121.156   10.226.132.104   <none>
demo-tikv-1                            1/1     Running     0          4d4h   10.233.88.195    10.226.132.105   <none>
demo-tikv-2                            1/1     Running     0          5d6h   10.233.95.68     10.226.132.106   <none>
lightning-test2-tidb-lightning-z7h6b   0/1     Completed   0          8h     10.233.95.98     10.226.132.106   <none>
]# kubectl get service -n liuxiao  
NAME                             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                          AGE
demo-discovery                   ClusterIP   10.233.59.189   <none>        10261/TCP,10262/TCP              5d6h
demo-grafana                     NodePort    10.233.13.94    <none>        3000:32673/TCP                   5d6h
demo-importer                    ClusterIP   None            <none>        8287/TCP                         4d6h
demo-monitor-reloader            ClusterIP   10.233.28.118   <none>        9089/TCP                         5d6h
demo-pd                          ClusterIP   10.233.38.172   <none>        2379/TCP                         5d6h
demo-pd-peer                     ClusterIP   None            <none>        2380/TCP                         5d6h
demo-prometheus                  ClusterIP   10.233.26.81    <none>        9090/TCP                         5d6h
demo-tidb                        NodePort    10.233.7.89     <none>        4000:31339/TCP,10080:30949/TCP   5d6h
demo-tidb-peer                   ClusterIP   None            <none>        10080/TCP                        5d6h
demo-tikv-peer                   ClusterIP   None            <none>        20160/TCP                        5d6h
lightning-test2-tidb-lightning   NodePort    10.233.55.0     <none>        8289:31315/TCP                   8h

应该是什么地方配置不对,麻烦帮忙看下,谢谢!!!
另外tidb-lightining和tikv-import的日志格式很好看:+1: 比主服务节点的日志好看:joy:

  1. 从lightning 和importer 的日志看起来网络都不通
    transport: Error while dialing dial tcp: lookup demo-tikv-1.demo-tikv-peer.liuxiao.svc on 10.226.146.245:53: no such host""
    Grpc(RpcFailure(RpcStatus { status: 14-UNAVAILABLE, details: Some(“DNS resolution failed”) }))"

  2. 先检查下网络问题

“switch mode failed”, 这个switch mode动作是tidb-lightning做的,还是tidb-importer做的?

是 tidb-lightning 执行的 switch mode.

顺便问下,k8s 环境下导入,是参考的这篇文章吗?https://docs.pingcap.com/zh/tidb-in-kubernetes/stable/restore-data-using-tidb-lightning

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。