sync-diff-inspector分表数据检查支持原库配置正则匹配吗

sync-diff-inspector分表数据检查支持原库配置正则匹配吗?
官方看给的例子是3-1的分表配置方法,如果我原库的分表比较多,该如何配置呢?主要是 ```
[[table-config]]

你好,

可以通过分库分表中间件进行数据对比(如果有该中间件的话),这样则不需要将所有分表写出来。

关于是否支持正则表达式,这边确认下,稍后给下回复

好的,还望支持~

你好,这边已经确认,支持正则表达式,支持形式参考下 sync-diff 其他标签:
https://docs.pingcap.com/zh/tidb/v4.0/sync-diff-inspector-overview

######################### Tables config #########################

如果需要对比大量的不同库名或者表名的表的数据,可以通过 table-rule 来设置映射关系。可以只配置 schema 或者 table 的映射关系,也可以都配置

[[table-rules]]
# schema-pattern 和 table-pattern 支持通配符 ?
schema-pattern = “test”
table-pattern = "record_20
"
target-schema = “test”
target-table = “record”

配置需要对比的目标数据库中的表

[[check-tables]]
# 库的名称
schema = “test”

# 需要检查的表的名称
tables = ["record"]

配置该表对应的分表的相关配置

[[table-config]]
# 目标库的名称
schema = “test”

# 目标库中表的名称
table = "record"

# 为分库分表场景下数据的对比,设置为 true
is-sharding = true

# 源数据表的配置
[[table-config.source-tables]]
# 源数据库实例的 id
instance-id = "rds-test"
schema = "test"
table  = "~^record_20*"

这样试了,报错:
[2020/06/22 15:19:05.798 +08:00] [ERROR] [config.go:98] [“must have more than one source tables if comparing sharding tables”] [stack=“github.com/pingcap/log.Error
\t/home/jenkins/agent/workspace/build_tidb_tools_master/go/pkg/mod/github.com/pingcap/log@v0.0.0-20191012051959-b742a5d432e9/global.go:42
main.(*TableConfig).Valid
\t/home/jenkins/agent/workspace/build_tidb_tools_master/go/src/github.com/pingcap/tidb-tools/sync_diff_inspector/config.go:98
main.(*Config).checkConfig
\t/home/jenkins/agent/workspace/build_tidb_tools_master/go/src/github.com/pingcap/tidb-tools/sync_diff_inspector/config.go:307
main.main
\t/home/jenkins/agent/workspace/build_tidb_tools_master/go/src/github.com/pingcap/tidb-tools/sync_diff_inspector/main.go:54
runtime.main
\t/usr/local/go/src/runtime/proc.go:203”]

官方能给个示例吗?这个情况还是比较常见的。

    # 支持使用正则表达式,需要以‘~’开始,
    # 下面的配置会检查所有表名以‘test’为前缀的表
    # tables = "~^test.*"
    # 下面的配置会检查配置库中所有的表
    # tables = "~^"

你好,你可能没理解我意思,你这个贴的是check-tables部分的,指定要检查目标库的哪些表,我的情况是 目标表是一个,原库是几百张月表,在table-config要怎么配置呢?就是上面我发的例子的table-config.source-tables部分。

你好,

配置文件中的正则表达式是通用的,楼上回复的含义是如下示例,测试下,看是否可以

table  = "record_20*"

变更为

table  = "~^record_20*"

回复好快!
改了会报这个错误,帮忙确认下这种情况是否支持。

[2020/06/23 16:43:55.625 +08:00] [ERROR] [config.go:98] [“must have more than one source tables if comparing sharding tables”] [stack=“github.com/pingcap/log.Error\ \t/home/jenkins/agent/workspace/build_tidb_tools_master/go/pkg/mod/github.com/pingcap/log@v0.0.0-20191012051959-b742a5d432e9/global.go:42\ main.(*TableConfig).Valid\ \t/home/jenkins/agent/workspace/build_tidb_tools_master/go/src/github.com/pingcap/tidb-tools/sync_diff_inspector/config.go:98\ main.(*Config).checkConfig\ \t/home/jenkins/agent/workspace/build_tidb_tools_master/go/src/github.com/pingcap/tidb-tools/sync_diff_inspector/config.go:307\ main.main\ \t/home/jenkins/agent/workspace/build_tidb_tools_master/go/src/github.com/pingcap/tidb-tools/sync_diff_inspector/main.go:54\ runtime.main\ \t/usr/local/go/src/runtime/proc.go:203”]
[2020/06/23 16:43:55.625 +08:00] [ERROR] [main.go:56] [“there is something wrong with your config, please check it!”] [stack=“github.com/pingcap/log.Error\ \t/home/jenkins/agent/workspace/build_tidb_tools_master/go/pkg/mod/github.com/pingcap/log@v0.0.0-20191012051959-b742a5d432e9/global.go:42\ main.main\ \t/home/jenkins/agent/workspace/build_tidb_tools_master/go/src/github.com/pingcap/tidb-tools/sync_diff_inspector/main.go:56\ runtime.main\ \t/usr/local/go/src/runtime/proc.go:203”]

辛苦上传下完整的配置文件

diff-test.toml (2.8 KB)
文件已上传。

ok,这边核实下

你好,这边已经向研发小伙伴求证,目前不支持 table sources 部分使用正则表达式,

目前如果 is_sharding 为 true ,source table 必须为多个。

这个有计划支持吗?因为数据导入是有支持的,希望在校验这块能够对齐。

当上游的分表表名比较统一,可以尝试使用,table-rules 进行匹配,用table-rules 就不用配置这个 table-config 了,当你的分表名称不是很统一,只能一个一个匹配才需要一个一个写。:

[[table-rules]]
schema-pattern = "diff_test"
table-pattern = "t*"
target-schema = "diff_test"
target-table = "t_10"

[[check-tables]]
schema = "diff_test"
tables = ["t_10"]

我试了,启动后一直报错:
[2020/06/24 14:29:04.355 +08:00] [WARN] [diff.go:648] [“save table summary info failed”] [schema=assets_finance] [table=record_voucher_detail] [error=“chunks of instanceID target schema assets_finance table record_voucher_detail not found”] [errorVerbose=“chunks of instanceID target schema assets_finance table record_voucher_detail not found
github.com/pingcap/errors.NotFoundf
\t/home/jenkins/agent/workspace/build_tidb_tools_master/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20190809092503-95897b64e011/juju_adaptor.go:117
github.com/pingcap/tidb-tools/pkg/diff.getChunkSummary
\t/home/jenkins/agent/workspace/build_tidb_tools_master/go/src/github.com/pingcap/tidb-tools/pkg/diff/checkpoint.go:207
github.com/pingcap/tidb-tools/pkg/diff.updateTableSummary
\t/home/jenkins/agent/workspace/build_tidb_tools_master/go/src/github.com/pingcap/tidb-tools/pkg/diff/checkpoint.go:227
github.com/pingcap/tidb-tools/pkg/diff.(*TableDiff).UpdateSummaryInfo.func1.1
\t/home/jenkins/agent/workspace/build_tidb_tools_master/go/src/github.com/pingcap/tidb-tools/pkg/diff/diff.go:646
github.com/pingcap/tidb-tools/pkg/diff.(*TableDiff).UpdateSummaryInfo.func1
\t/home/jenkins/agent/workspace/build_tidb_tools_master/go/src/github.com/pingcap/tidb-tools/pkg/diff/diff.go:667
runtime.goexit
\t/usr/local/go/src/runtime/asm_amd64.s:1357”]
[2020/06/24 14:29:14.355 +08:00] [WARN] [diff.go:648] [“save table summary info failed”] [schema=assets_finance] [table=record_voucher_detail] [error=“chunks of instanceID target schema assets_finance table record_voucher_detail not found”] [errorVerbose=“chunks of instanceID target schema assets_finance table record_voucher_detail not found
github.com/pingcap/errors.NotFoundf
\t/home/jenkins/agent/workspace/build_tidb_tools_master/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20190809092503-95897b64e011/juju_adaptor.go:117
github.com/pingcap/tidb-tools/pkg/diff.getChunkSummary
\t/home/jenkins/agent/workspace/build_tidb_tools_master/go/src/github.com/pingcap/tidb-tools/pkg/diff/checkpoint.go:207
github.com/pingcap/tidb-tools/pkg/diff.updateTableSummary
\t/home/jenkins/agent/workspace/build_tidb_tools_master/go/src/github.com/pingcap/tidb-tools/pkg/diff/checkpoint.go:227
github.com/pingcap/tidb-tools/pkg/diff.(*TableDiff).UpdateSummaryInfo.func1.1
\t/home/jenkins/agent/workspace/build_tidb_tools_master/go/src/github.com/pingcap/tidb-tools/pkg/diff/diff.go:646
github.com/pingcap/tidb-tools/pkg/diff.(*TableDiff).UpdateSummaryInfo.func1
\t/home/jenkins/agent/workspace/build_tidb_tools_master/go/src/github.com/pingcap/tidb-tools/pkg/diff/diff.go:667
runtime.goexit
\t/usr/local/go/src/runtime/asm_amd64.s:1357”]
^C

下游 target schema assets_finance table record_voucher_detail 是存在的。
下面是配置文件。

说明一下:上下游表字段是一致的,但是两边的主键不一致,因为分表都是自增id, 所以下游表采用的是关联多个字段作为主键的。
diff-test.toml (2.4 KB)

你好,

请在下游 shell 执行下 show create table assets_finance.record_voucher_detail \G 并返回下截图,确认下确实存在该表。

PS:去掉 instance id 再试下,会不会有报错

表存在的。

去掉instanceid校验失败:

[WARN] [diff.go:648] [“save table summary info failed”]这个问题有相同的 issue https://github.com/pingcap/tidb-tools/issues/354 ,在 issue 里描述了问题的原因,可以看下。这个问题只是会打印日志,造成干扰,不影响校验结果,已经在 https://github.com/pingcap/tidb-tools/pull/355 修复了。

简单来说就是 diff 会定时查 checkpoint 表中的 chunk 状态汇总成状态信息打印出来,方便查看校验进度。查的时候有的表还没划分好 chunk,查的时候没有数据,就会报这个 warn。

instanceid 要配置的,不然配置检查失败会报错