Tiflash 无法获取到数据

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】: Release Version: v4.0.0-rc Git Commit Hash: 79db9e30ab8f98ac07c8ae55c66dfecc24b43d56 Git Branch: heads/refs/tags/v4.0.0-rc UTC Build Time: 2020-04-08 07:32:25

  • 【问题描述】: 设置同步数据到 tiflash,数据迟迟不同步 进度始终为0,重启整个集群都没用,几个小时了都一样 ALTER TABLE std_finished_mts SET TIFLASH REPLICA 1;

mysql> SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA =‘break’; ±-------------±---------------------±---------±--------------±----------------±----------±---------+ | TABLE_SCHEMA | TABLE_NAME | TABLE_ID | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS | ±-------------±---------------------±---------±--------------±----------------±----------±---------+ | break | student_crm | 1335 | 1 | | 0 | 0 | | break | class_records | 1391 | 1 | | 0 | 0 | | break | std_finished_mts | 1145 | 1 | | 0 | 0 | ±-------------±---------------------±---------±--------------±----------------±----------±---------+

强制去查询也是报错的 set SESSION tidb_isolation_read_engines =‘tiflash’; ERROR 1815 (HY000): Internal : Can not find access path matching ‘tidb_isolation_read_engines’(value: ‘tiflash’). Available values are ‘tikv’.

然后tiflash一直的输出是这样的(tiflash连接上tikv了么?):

2020.04.29 16:23:49.558062 [ 15 ] HTTPHandler: Done processing query 2020.04.29 16:23:49.703059 [ 14 ] executeQuery: Read 2 rows, 19.00 B in 0.000 sec., 27107 rows/sec., 251.49 KiB/sec. 2020.04.29 16:23:49.703192 [ 14 ] HTTPHandler: Done processing query 2020.04.29 16:23:54.535037 [ 15 ] executeQuery: Read 2 rows, 19.00 B in 0.000 sec., 27400 rows/sec., 254.20 KiB/sec. 2020.04.29 16:23:54.535164 [ 15 ] HTTPHandler: Done processing query 2020.04.29 16:23:54.549692 [ 14 ] executeQuery: Read 2 rows, 19.00 B in 0.000 sec., 30182 rows/sec., 280.02 KiB/sec. 2020.04.29 16:23:54.549790 [ 14 ] HTTPHandler: Done processing query 2020.04.29 16:23:55.120550 [ 15 ] executeQuery: Read 2 rows, 19.00 B in 0.000 sec., 26918 rows/sec., 249.73 KiB/sec. 2020.04.29 16:23:55.120677 [ 15 ] HTTPHandler: Done processing query 2020.04.29 16:23:59.533333 [ 14 ] executeQuery: Read 2 rows, 19.00 B in 0.000 sec., 27685 rows/sec., 256.85 KiB/sec. 2020.04.29 16:23:59.533464 [ 14 ] HTTPHandler: Done processing query 2020.04.29 16:23:59.547286 [ 15 ] executeQuery: Read 2 rows, 19.00 B in 0.000 sec., 39469 rows/sec., 366.17 KiB/sec. 2020.04.29 16:23:59.547382 [ 15 ] HTTPHandler: Done processing query 2020.04.29 16:23:59.609287 [ 14 ] executeQuery: Read 2 rows, 19.00 B in 0.000 sec., 23773 rows/sec., 220.55 KiB/sec. 2020.04.29 16:23:59.609423 [ 14 ] HTTPHandler: Done processing query

你好,

  1. 如果需要部署 TiFlash,请把 pd 部分的 replication.enable-placement-rules 配置设置为 true,并滚启 pd,并验证

执行 pd-ctl -u http://pdip:pdport config show |grep enable-placement-rules 验证下,

2, 问下 tiflash 下存在三个日志文件,请都上传看下。或者执行完 1 看是否恢复,

pd-ctl 哪边可以下载 自己编译的有错误

runtime/internal/atomic

…/…/go/src/runtime/internal/atomic/atomic_amd64x.go:18:6: Load redeclared in this block previous declaration at …/…/go/src/runtime/internal/atomic/atomic_amd64.go:16:24 …/…/go/src/runtime/internal/atomic/atomic_amd64x.go:24:6: Loadp redeclared in this block previous declaration at …/…/go/src/runtime/internal/atomic/atomic_amd64.go:22:32 …/…/go/src/runtime/internal/atomic/atomic_amd64x.go:30:6: Load64 redeclared in this block previous declaration at …/…/go/src/runtime/internal/atomic/atomic_amd64.go:28:26 …/…/go/src/runtime/internal/atomic/atomic_amd64x.go:36:6: LoadAcq redeclared in this block previous declaration at …/…/go/src/runtime/internal/atomic/atomic_amd64.go:34:27 …/…/go/src/runtime/internal/atomic/atomic_amd64x.go:41:6: Xadd redeclared in this block previous declaration at …/…/go/src/runtime/internal/atomic/atomic_amd64.go:39:37 …/…/go/src/runtime/internal/atomic/atomic_amd64x.go:44:6: Xadd64 redeclared in this block previous declaration at …/…/go/src/runtime/internal/atomic/atomic_amd64.go:42:39 …/…/go/src/runtime/internal/atomic/atomic_amd64x.go:47:6: Xadduintptr redeclared in this block previous declaration at …/…/go/src/runtime/internal/atomic/atomic_amd64.go:45:47 …/…/go/src/runtime/internal/atomic/atomic_amd64x.go:50:6: Xchg redeclared in this block previous declaration at …/…/go/src/runtime/internal/atomic/atomic_amd64.go:48:36 …/…/go/src/runtime/internal/atomic/atomic_amd64x.go:53:6: Xchg64 redeclared in this block previous declaration at …/…/go/src/runtime/internal/atomic/atomic_amd64.go:51:38 …/…/go/src/runtime/internal/atomic/atomic_amd64x.go:56:6: Xchguintptr redeclared in this block previous declaration at …/…/go/src/runtime/internal/atomic/atomic_amd64.go:54:45 …/…/go/src/runtime/internal/atomic/atomic_amd64x.go:56:6: too many errors

你好,

https://pingcap.com/docs-cn/stable/reference/tools/pd-control/

  1. ./pd-ctl config show |grep enable-placement-rules
    “enable-placement-rules”: “true”,
    pd显示是支持的

下面是tiflash的日志
tiflash.log.tar.gz (2.6 MB)

tiflash仍然无法同步表数据

你好,

  1. 日志中从 http://127.0.0.1:2379 去访问 pd ,请确认下是否与网卡上的 ip 匹配,截图确认下
  2. pd-ctl 的 pdhost:port 指定的是多少?
  3. 检查下 pd 的状态,并截图反馈下
2020.04.30 12:56:11.776116 [ 34 ] <Error> pingcap.pd: failed to get cluster id by :http://127.0.0.1:2379
2020.04.30 12:56:11.776165 [ 34 ] <Error> pingcap.pd: Exception: failed to update leader
2020.04.30 12:56:16.775986 [ 33 ] <Error> pingcap.pd: write tso failed

是127.0.0.1

测试的原因 现在是单机集群,ip全部是127.0.0.1

pd tikv tidb tiflash 实例数都是1个

这个是集群部署配置文件内容:

  user: "root"
  ssh_port: 22
  deploy_dir: "/tidb-deploy"
  data_dir: "/tidb-data"

monitored:
  node_exporter_port: 9100
  blackbox_exporter_port: 9115



server_configs:
  tidb:
    log.slow-threshold: 300
    binlog.enable: false
    binlog.ignore-error: false
  tikv:
    readpool.storage.use-unified-pool: false
    readpool.coprocessor.use-unified-pool: true
  pd:
    schedule.leader-schedule-limit: 4
    schedule.region-schedule-limit: 2048
    schedule.replica-schedule-limit: 64
    replication.enable-placement-rules: true
  tiflash:
    logger.level: "info"


pd_servers:
  - host: 127.0.0.1

tidb_servers:
  - host: 127.0.0.1

tikv_servers:
  - host: 127.0.0.1


tiflash_servers:
  - host: 127.0.0.1

monitoring_servers:
  - host: 127.0.0.1

grafana_servers:
  - host: 127.0.0.1

alertmanager_servers:
  - host: 127.0.0.1

你好,

  1. 确认个事情,tidb 集群状态是否正常,截图反馈下,
  2. 上传下 pd.log 确认下 pd 是否可正常对外服务

这个建议使用对应网卡的 ip 部署,不建议使用 127.0.0.1,切换下再试下。

root@tidb1:/# tiup cluster display test
Starting component `cluster`: /root/.tiup/components/cluster/v0.5.0/cluster display test
TiDB Cluster: test
TiDB Version: v4.0.0-rc
ID               Role          Host       Ports                            Status     Data Dir                      Deploy Dir
--               ----          ----       -----                            ------     --------                      ----------
127.0.0.1:9093   alertmanager  127.0.0.1  9093/9094                        -          /tidb-data/alertmanager-9093  /tidb-deploy/alertmanager-9093
127.0.0.1:3000   grafana       127.0.0.1  3000                             -          -                             /tidb-deploy/grafana-3000
127.0.0.1:2379   pd            127.0.0.1  2379/2380                        Healthy|L  /tidb-data/pd-2379            /tidb-deploy/pd-2379
127.0.0.1:9090   prometheus    127.0.0.1  9090                             -          /tidb-data/prometheus-9090    /tidb-deploy/prometheus-9090
127.0.0.1:4000   tidb          127.0.0.1  4000/10080                       Up         -                             /tidb-deploy/tidb-4000
127.0.0.1:9000   tiflash       127.0.0.1  9000/8123/3930/20170/20292/8234  Up         /tidb-data/tiflash-9000       /tidb-deploy/tiflash-9000
127.0.0.1:20160  tikv          127.0.0.1  20160/20180                      Up         /tidb-data/tikv-20160         /tidb-deploy/tikv-20160

集群状态是正常的

这个是pd日志:
pd.log.tar.gz (4.5 MB)

pd 日志从 12:56:32 显示 "PD cluster leader is ready to serve",说明 leader 已能够正常提供服务,tiflash error 日志在此时间后也没有新的报错,可以再检查下当前的数据同步状态。

经查实,是 pd 4.0 rc 版本的bug导致,升级集群后可正常恢复。

请问是哪个bug? 升级到哪个版本可以恢复正常?

重启下集群就好了,一开始也是不同步,好像是PD间通信有问题


ok,这边反馈下

我这边也是新建的集群 4.0RC 要重启PD才能生效,才能同步数据。准备升级到4.0RC.1去了

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。