升级TiDB报错

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:2.0.6
  • 【问题描述】:从2。0.6升级至2.1版本过程中,报错如下: TASK [check_config_pd : Check PD config] ****************************************************************************************************************************************************** fatal: [10.240.17.203]: FAILED! => {“changed”: true, “cmd”: “cd /tmp/tidb_check_config && ./pd-server -config ./pd.toml -config-check”, “delta”: “0:00:00.318676”, “end”: “2019-12-27 00:59:02.183263”, “msg”: “non-zero return code”, “rc”: 1, “start”: “2019-12-27 00:59:01.864587”, “stderr”: “flag provided but not defined: -config-check\nUsage of pd:\n -L string\n \tlog level: debug, info, warn, error, fatal (default ‘info’)\n -V\tprint version information and exit\n -advertise-client-urls string\n \tadvertise url for client traffic (default ‘${client-urls}’)\n -advertise-peer-urls string\n \tadvertise url for peer traffic (default ‘${peer-urls}’)\n -cacert string\n \tPath of file that contains list of trusted TLS CAs\n -cert string\n \tPath of file that contains X509 certificate in PEM format\n -client-urls string\n \turl for client traffic (default “http://127.0.0.1:2379”)\n -config string\n \tConfig file\n -data-dir string\n \tpath to the data directory (default ‘default.${name}’)\n -initial-cluster string\n \tinitial cluster configuration for bootstrapping, e,g. pd=http://127.0.0.1:2380\n -join string\n \tjoin to an existing cluster (usage: cluster’s ‘${advertise-client-urls}’\n -key string\n \tPath of file that contains X509 key in PEM format\n -log-file string\n \tlog file path\n -log-rotate\n \trotate log (default true)\n -name string\n \thuman-readable name for this pd member (default “pd”)\n -namespace-classifier string\n \tnamespace classifier (default ‘table’) (default “table”)\n -peer-urls string\n \turl for peer traffic (default “http://127.0.0.1:2380”)\n -version\n \tprint version information and exit\ntime=“2019-12-27T00:59:02+08:00” level=fatal msg=“parse cmd flags error: flag provided but not defined: -config-check\ngithub.com/pingcap/pd/server.(*Config).Parse\n\t/home/jenkins/workspace/release_tidb_2.1-ga/go/src/github.com/pingcap/pd/server/config.go:227\nmain.main\n\t/home/jenkins/workspace/release_tidb_2.1-ga/go/src/github.com/pingcap/pd/cmd/pd-server/main.go:40\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:200\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1337\n””, “stderr_lines”: [“flag provided but not defined: -config-check”, “Usage of pd:”, " -L string", " \tlog level: debug, info, warn, error, fatal (default ‘info’)", " -V\tprint version information and exit", " -advertise-client-urls string", " \tadvertise url for client traffic (default ‘${client-urls}’)", " -advertise-peer-urls string", " \tadvertise url for peer traffic (default ‘${peer-urls}’)", " -cacert string", " \tPath of file that contains list of trusted TLS CAs", " -cert string", " \tPath of file that contains X509 certificate in PEM format", " -client-urls string", " \turl for client traffic (default “http://127.0.0.1:2379”)", " -config string", " \tConfig file", " -data-dir string", " \tpath to the data directory (default ‘default.${name}’)", " -initial-cluster string", " \tinitial cluster configuration for bootstrapping, e,g. pd=http://127.0.0.1:2380", " -join string", " \tjoin to an existing cluster (usage: cluster’s ‘${advertise-client-urls}’", " -key string", " \tPath of file that contains X509 key in PEM format", " -log-file string", " \tlog file path", " -log-rotate", " \trotate log (default true)", " -name string", " \thuman-readable name for this pd member (default “pd”)", " -namespace-classifier string", " \tnamespace classifier (default ‘table’) (default “table”)", " -peer-urls string", " \turl for peer traffic (default “http://127.0.0.1:2380”)", " -version", " \tprint version information and exit", “time=“2019-12-27T00:59:02+08:00” level=fatal msg=“parse cmd flags error: flag provided but not defined: -config-check\ngithub.com/pingcap/pd/server.(*Config).Parse\n\t/home/jenkins/workspace/release_tidb_2.1-ga/go/src/github.com/pingcap/pd/server/config.go:227\nmain.main\n\t/home/jenkins/workspace/release_tidb_2.1-ga/go/src/github.com/pingcap/pd/cmd/pd-server/main.go:40\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:200\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1337\n””], “stdout”: “”, “stdout_lines”: []} to retry, use: --limit @/home/tidb/tidb-ansible/retry_files/rolling_update.retry

请帮忙看看是什么原因。需要查看哪些配置文件后补上来

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

TIKV同样报错

TASK [check_config_tikv : Check TiKV config] ************************************************************************************************************************************************** fatal: [TiKV1-1]: FAILED! => {“changed”: true, “cmd”: “cd /tmp/tidb_check_config && ./tikv-server --pd-endpoints pd:port --config ./tikv.toml --config-check”, “delta”: “0:00:00.352911”, “end”: “2019-12-27 01:09:33.149723”, “msg”: “non-zero return code”, “rc”: 1, “start”: “2019-12-27 01:09:32.796812”, “stderr”: “error: Found argument ‘–config-check’ which wasn’t expected, or isn’t valid in this context\n\tDid you mean \u001b[32m–\u001b[0m\u001b[32mconfig\u001b[0m?\n\nUSAGE:\n tikv-server --config --pd-endpoints <PD_URL>…\n\nFor more information try --help”, “stderr_lines”: [“error: Found argument ‘–config-check’ which wasn’t expected, or isn’t valid in this context”, “\tDid you mean \u001b[32m–\u001b[0m\u001b[32mconfig\u001b[0m?”, “”, “USAGE:”, " tikv-server --config --pd-endpoints <PD_URL>…", “”, “For more information try --help”], “stdout”: “”, “stdout_lines”: []}

注释掉上述config检查后,又报错:

TASK [Check pd cluster status] **************************************************************************************************************************************************************** fatal: [10.240.17.203]: FAILED! => {“changed”: false, “content”: “”, “msg”: “Status code was -1 and not [200]: Request failed: <urlopen error [Errno 111] 拒绝连接>”, “redirected”: false, “status”: -1, “url”: “http://10.240.17.203:2379/pd/health”}

NO MORE HOSTS LEFT **************************************************************************************************************************************************************************** to retry, use: --limit @/home/tidb/tidb-ansible/retry_files/rolling_update.retry

PLAY RECAP ************************************************************************************************************************************************************************************ 10.240.17.200 : ok=10 changed=1 unreachable=0 failed=0
10.240.17.201 : ok=9 changed=1 unreachable=0 failed=0
10.240.17.203 : ok=8 changed=0 unreachable=0 failed=1
10.240.17.204 : ok=8 changed=0 unreachable=0 failed=0
10.240.17.205 : ok=8 changed=0 unreachable=0 failed=0
10.240.17.206 : ok=8 changed=0 unreachable=0 failed=0
10.240.17.207 : ok=8 changed=0 unreachable=0 failed=0
10.240.17.208 : ok=8 changed=0 unreachable=0 failed=0
TiKV1-1 : ok=3 changed=0 unreachable=0 failed=0
TiKV1-2 : ok=3 changed=0 unreachable=0 failed=0
TiKV2-1 : ok=3 changed=0 unreachable=0 failed=0
TiKV2-2 : ok=3 changed=0 unreachable=0 failed=0
TiKV3-1 : ok=3 changed=0 unreachable=0 failed=0
TiKV3-2 : ok=3 changed=0 unreachable=0 failed=0
localhost : ok=7 changed=4 unreachable=0 failed=0

ERROR MESSAGE SUMMARY ************************************************************************************************************************************************************************* [10.240.17.200]: Ansible UNREACHABLE! => playbook: stop.yml; TASK: wait until the pushgateway port is down; message: {“changed”: false, “msg”: “Failed to connect to the host via ssh: Shared connection to 10.240.17.200 closed.”, “unreachable”: true}

[10.240.17.203]: Ansible FAILED! => playbook: rolling_update.yml; TASK: Check pd cluster status; message: {“changed”: false, “content”: “”, “msg”: “Status code was -1 and not [200]: Request failed: <urlopen error [Errno 111] 拒绝连接>”, “redirected”: false, “status”: -1, “url”: “http://10.240.17.203:2379/pd/health”}

1、请将详细的升级步骤提供下

2、是从 2.0.6 升级到 2.1.x 这个 2.1 的具体的版本是什么,检查请确认下载的 tidb-ansible 是否是升级的目标版本 2.1 对应的包

3、检查下 pushgateway 的服务是否可达

非常感谢您的解答。

我们是升级到2.1.6版本。按照https://pingcap.com/docs-cn/stable/how-to/upgrade/from-previous-version/步骤执行。最后我们将rolling_update.yml中的以下内容注释顺利完成的升级。

image

:+1::+1::+1: