【 TiDB 使用环境】离线仓库
【 TiDB 版本】v6.5.0
【遇到的问题:问题现象及影响】
从6.1.3 升级到6.50 ,升级的过程中最后一步重启tidb-server 的时候卡主,通过替换包,扩容缩容等各种操作终于将集群整体升级完毕。现在要修改一个参数,但是重启tidb-server 依然是失败的。主要现象如下:
1、执行命令卡主
2、加上–force 参数强制重启成功, tiup cluster display 显示的是down 的状态
tiup cluster reload tidb-datalake03 -N 10.105.xxx:5670 --force
3、到tidb-server 对应的机器上查看,进程是启动的,但是端口并没有在监听
4、过了好一会,发现tidb-server 又恢复正常了,各种状态显示都正确了
tidb.log.tar.gz (2.1 MB)
当我再次reload tidb-server 又是出现卡主的状态, 重启部分日志如下
[2023/04/24 11:37:36.243 +08:00] [INFO] [job_manager.go:265] ["scale ttl worker"] [ttl-worker=manager] [originalCount=0] [newCount=4]
[2023/04/24 11:37:36.243 +08:00] [INFO] [job_manager.go:265] ["scale ttl worker"] [ttl-worker=manager] [originalCount=0] [newCount=4]
[2023/04/24 11:40:32.966 +08:00] [INFO] [cpuprofile.go:113] ["parallel cpu profiler started"]
[2023/04/24 11:40:32.967 +08:00] [INFO] [printer.go:34] ["Welcome to TiDB."] ["Release Version"=v6.5.0] [Edition=Community] ["Git Commit Hash"=706c3fa3c526cdba5b3e9f066b1a568fb96c56e3] ["Git Branch"=heads/refs/tags/v6.5.0] ["UTC Build Time"="2022-12-27 03:50:44"] [GoVersion=go1.19.3] ["Race Enabled"=false] ["Check Table Before Drop"=false] ["TiKV Min Version"=6.2.0-alpha]
[2023/04/24 11:40:32.967 +08:00] [INFO] [printer.go:48] ["loaded config"] [config="{\"host\":\"0.0.0.0\",\"advertise-address\":\"10.105.129.146\",\"port\":5670,\"cors\":\"\",\"store\":\"tikv\",\"path\":\"10.33.32.50:2415,10.33.32.48:2415,10.33.32.74:2415\",\"socket\":\"/tmp/tidb-5670.sock\",\"lease\":\"45s\",\"split-table\":true,\"token-limit\":1000,\"temp-dir\":\"/tmp/tidb\",\"tmp-storage-path\":\"/tmp/2004_tidb/MC4wLjAuMDo1NjcwLzAuMC4wLjA6MTAxMTg=/tmp-storage\",\"tmp-storage-quota\":-1,\"server-version\":\"\",\"version-comment\":\"\",\"tidb-edition\":\"\",\"tidb-release-version\":\"\",\"log\":{\"level\":\"info\",\"format\":\"text\",\"disable-timestamp\":null,\"enable-timestamp\":null,\"disable-error-stack\":null,\"enable-error-stack\":null,\"file\":{\"filename\":\"/data/tidb/deploy/tidb-5670/log/tidb.log\",\"max-size\":300,\"max-days\":0,\"max-backups\":0},\"slow-query-file\":\"/data/tidb/deploy/tidb-5670/log/tidb_slow_query.log\",\"expensive-threshold\":10000,\"query-log-max-len\":4096,\"enable-slow-log\":true,\"slow-threshold\":500,\"record-plan-in-slow-log\":1},\"instance\":{\"tidb_general_log\":false,\"tidb_pprof_sql_cpu\":false,\"ddl_slow_threshold\":300,\"tidb_expensive_query_time_threshold\":60,\"tidb_enable_slow_log\":true,\"tidb_slow_log_threshold\":500,\"tidb_record_plan_in_slow_log\":1,\"tidb_check_mb4_value_in_utf8\":true,\"tidb_force_priority\":\"NO_PRIORITY\",\"tidb_memory_usage_alarm_ratio\":0.8,\"tidb_enable_collect_execution_info\":true,\"plugin_dir\":\"/data/deploy/plugin\",\"plugin_load\":\"\",\"max_connections\":0,\"tidb_enable_ddl\":true,\"tidb_rc_read_check_ts\":false},\"security\":{\"skip-grant-table\":false,\"ssl-ca\":\"\",\"ssl-cert\":\"\",\"ssl-key\":\"\",\"cluster-ssl-ca\":\"\",\"cluster-ssl-cert\":\"\",\"cluster-ssl-key\":\"\",\"cluster-verify-cn\":null,\"session-token-signing-cert\":\"\",\"session-token-signing-key\":\"\",\"spilled-file-encryption-method\":\"plaintext\",\"enable-sem\":false,\"auto-tls\":false,\"tls-version\":\"\",\"rsa-key-size\":4096,\"secure-bootstrap\":false,\"auth-token-jwks\":\"\",\"auth-token-refresh-interval\":\"1h0m0s\",\"disconnect-on-expired-password\":true},\"status\":{\"status-host\":\"0.0.0.0\",\"metrics-addr\":\"\",\"status-port\":10118,\"metrics-interval\":15,\"report-status\":true,\"record-db-qps\":false,\"grpc-keepalive-time\":10,\"grpc-keepalive-timeout\":3,\"grpc-concurrent-streams\":1024,\"grpc-initial-window-size\":2097152,\"grpc-max-send-msg-size\":2147483647},\"performance\":{\"max-procs\":0,\"max-memory\":0,\"server-memory-quota\":0,\"stats-lease\":\"3s\",\"stmt-count-limit\":5000,\"pseudo-estimate-ratio\":0.8,\"bind-info-lease\":\"3s\",\"txn-entry-size-limit\":125829120,\"txn-total-size-limit\":304857600,\"tcp-keep-alive\":true,\"tcp-no-delay\":true,\"cross-join\":true,\"distinct-agg-push-down\":false,\"projection-push-down\":false,\"max-txn-ttl\":3600000,\"index-usage-sync-lease\":\"0s\",\"plan-replayer-gc-lease\":\"10m\",\"gogc\":100,\"enforce-mpp\":false,\"stats-load-concurrency\":5,\"stats-load-queue-size\":1000,\"analyze-partition-concurrency-quota\":16,\"enable-stats-cache-mem-quota\":false,\"committer-concurrency\":128,\"run-auto-analyze\":true,\"force-priority\":\"NO_PRIORITY\",\"memory-usage-alarm-ratio\":0.8,\"enable-load-fmsketch\":false},\"prepared-plan-cache\":{\"enabled\":true,\"capacity\":100,\"memory-guard-ratio\":0.1},\"opentracing\":{\"enable\":false,\"rpc-metrics\":false,\"sampler\":{\"type\":\"const\",\"param\":1,\"sampling-server-url\":\"\",\"max-operations\":0,\"sampling-refresh-interval\":0},\"reporter\":{\"queue-size\":0,\"buffer-flush-interval\":0,\"log-spans\":false,\"local-agent-host-port\":\"\"}},\"proxy-protocol\":{\"networks\":\"\",\"header-timeout\":5},\"pd-client\":{\"pd-server-timeout\":3},\"tikv-client\":{\"grpc-connection-count\":4,\"grpc-keepalive-time\":10,\"grpc-keepalive-timeout\":3,\"grpc-compression-type\":\"none\",\"commit-timeout\":\"41s\",\"async-commit\":{\"keys-limit\":256,\"total-key-size-limit\":4096,\"safe-window\":2000000000,\"allowed-clock-drift\":500000000},\"max-batch-size\":128,\"overload-threshold\":200,\"max-batch-wait-time\":0,\"batch-wait-size\":8,\"enable-chunk-rpc\":true,\"region-cache-ttl\":600,\"store-limit\":0,\"store-liveness-timeout\":\"1s\",\"copr-cache\":{\"capacity-mb\":1000},\"ttl-refreshed-txn-size\":33554432,\"resolve-lock-lite-threshold\":16},\"binlog\":{\"enable\":false,\"ignore-error\":false,\"write-timeout\":\"15s\",\"binlog-socket\":\"\",\"strategy\":\"range\"},\"compatible-kill-query\":false,\"pessimistic-txn\":{\"max-retry-count\":256,\"deadlock-history-capacity\":10,\"deadlock-history-collect-retryable\":false,\"pessimistic-auto-commit\":false,\"constraint-check-in-place-pessimistic\":true},\"max-index-length\":12288,\"index-limit\":64,\"table-column-count-limit\":1017,\"graceful-wait-before-shutdown\":0,\"alter-primary-key\":false,\"treat-old-version-utf8-as-utf8mb4\":true,\"enable-table-lock\":false,\"delay-clean-table-lock\":0,\"split-region-max-num\":1000,\"top-sql\":{\"receiver-address\":\"\"},\"repair-mode\":false,\"repair-table-list\":[],\"isolation-read\":{\"engines\":[\"tikv\",\"tiflash\",\"tidb\"]},\"new_collations_enabled_on_first_bootstrap\":true,\"experimental\":{\"allow-expression-index\":false},\"skip-register-to-dashboard\":false,\"enable-telemetry\":true,\"labels\":{},\"enable-global-index\":false,\"deprecate-integer-display-length\":false,\"enable-enum-length-limit\":true,\"stores-refresh-interval\":60,\"enable-tcp4-only\":false,\"enable-forwarding\":false,\"max-ballast-object-size\":0,\"ballast-object-size\":0,\"transaction-summary\":{\"transaction-summary-capacity\":500,\"transaction-id-digest-min-duration\":2147483647},\"enable-global-kill\":true,\"enable-batch-dml\":false,\"mem-quota-query\":30073741824,\"oom-action\":\"cancel\",\"oom-use-tmp-storage\":true,\"check-mb4-value-in-utf8\":true,\"enable-collect-execution-info\":true,\"plugin\":{\"dir\":\"/data/deploy/plugin\",\"load\":\"\"},\"max-server-connections\":0,\"run-ddl\":true,\"tidb-max-reuse-chunk\":64,\"tidb-max-reuse-column\":256}"]
[2023/04/24 11:40:32.967 +08:00] [INFO] [main.go:361] ["disable Prometheus push client"]
[2023/04/24 11:40:32.967 +08:00] [INFO] [store.go:75] ["new store"] [path=tikv://10.33.32.50:2415,10.33.32.48:2415,10.33.32.74:2415]
[2023/04/24 11:40:32.967 +08:00] [INFO] [client.go:405] ["[pd] create pd client with endpoints"] [pd-address="[10.33.32.50:2415,10.33.32.48:2415,10.33.32.74:2415]"]
[2023/04/24 11:40:32.968 +08:00] [INFO] [systime_mon.go:26] ["start system time monitor"]
[2023/04/24 11:40:33.016 +08:00] [INFO] [base_client.go:360] ["[pd] update member urls"] [old-urls="[http://10.33.32.50:2415,http://10.33.32.48:2415,http://10.33.32.74:2415]"] [new-urls="[http://10.33.32.48:2415,http://10.33.32.50:2415,http://10.33.32.74:2415]"]
[2023/04/24 11:40:33.016 +08:00] [INFO] [base_client.go:378] ["[pd] switch leader"] [new-leader=http://10.33.32.50:2415] [old-leader=]
[2023/04/24 11:40:33.016 +08:00] [INFO] [base_client.go:105] ["[pd] init cluster id"] [cluster-id=7047010410936905566]
[2023/04/24 11:40:33.016 +08:00] [INFO] [client.go:698] ["[pd] tso dispatcher created"] [dc-location=global]
[2023/04/24 11:40:33.021 +08:00] [INFO] [store.go:81] ["new store with retry success"]
[2023/04/24 11:40:33.105 +08:00] [INFO] [tidb.go:77] ["new domain"] [store=tikv-7047010410936905566] ["ddl lease"=45s] ["stats lease"=3s] ["index usage sync lease"=0s]
[2023/04/24 11:40:33.132 +08:00] [INFO] [domain.go:2280] [acquireServerID] [serverID=2544995] ["lease id"=68ee8737eb50f525]
[2023/04/24 11:40:33.156 +08:00] [WARN] [info.go:245] ["init TiFlashReplicaManager"] ["pd addrs"="[10.33.32.48:2415,10.33.32.74:2415,10.33.32.50:2415]"]
[2023/04/24 11:40:38.385 +08:00] [INFO] [domain.go:220] ["full load InfoSchema success"] [currentSchemaVersion=0] [neededSchemaVersion=128722] ["start time"=5.195690108s]
[2023/04/24 11:40:38.399 +08:00] [INFO] [domain.go:487] ["full load and reset schema validator"]
[2023/04/24 11:40:38.399 +08:00] [INFO] [ddl.go:701] ["[ddl] start DDL"] [ID=c79f9092-0779-4cbc-a0bb-0ed79ce6f27a] [runWorker=true]
[2023/04/24 11:40:38.399 +08:00] [INFO] [ddl.go:647] ["[ddl] start delRangeManager OK"] ["is a emulator"=false]
[2023/04/24 11:40:38.399 +08:00] [INFO] [manager.go:151] ["start campaign owner"] [ownerInfo="[ddl] /tidb/ddl/fg/owner"]
[2023/04/24 11:40:38.399 +08:00] [INFO] [ddl_worker.go:171] ["[ddl] start DDL worker"] [worker="worker 1, tp general"]
[2023/04/24 11:40:38.399 +08:00] [INFO] [ddl_worker.go:171] ["[ddl] start DDL worker"] [worker="worker 2, tp add index"]
[2023/04/24 11:40:38.405 +08:00] [INFO] [env.go:108] ["[ddl-ingest] the lightning sorted dir"] ["data path:"=/tmp/tidb/tmp_ddl-5670]
[2023/04/24 11:40:38.405 +08:00] [INFO] [env.go:75] ["[ddl-ingest] init global lightning backend environment finished"] ["memory limitation"=2147483648] ["sort path disk quota"=107374182400] ["max open file number"=1000000] ["lightning is initialized"=true]
[2023/04/24 11:40:38.405 +08:00] [INFO] [owner_daemon.go:70] ["begin advancer daemon"] [daemon-id=LogBackup::Advancer]
[2023/04/24 11:40:38.405 +08:00] [INFO] [manager.go:151] ["start campaign owner"] [ownerInfo="[log-backup] /tidb/br-stream/owner"]
[2023/04/24 11:40:38.410 +08:00] [INFO] [owner_daemon.go:77] ["begin running daemon"] [id=7232e7bc-8257-4d6d-9f8b-5b8e9d9ffe05] [daemon-id=LogBackup::Advancer]
[2023/04/24 11:40:38.460 +08:00] [INFO] [manager.go:151] ["start campaign owner"] [ownerInfo="[bindinfo] /tidb/bindinfo/owner"]
[2023/04/24 11:40:38.465 +08:00] [WARN] [sysvar_cache.go:50] ["sysvar cache is empty, triggering rebuild"]
[2023/04/24 11:40:38.702 +08:00] [INFO] [telemetry.go:176] ["Telemetry configuration"] [endpoint=https://telemetry.pingcap.com/api/v1/tidb/report] [report_interval=6h0m0s] [enabled=true]
[2023/04/24 11:40:38.702 +08:00] [INFO] [manager.go:151] ["start campaign owner"] [ownerInfo="[stats] /tidb/stats/owner"]
[2023/04/24 11:40:38.716 +08:00] [INFO] [gc_worker.go:209] ["[gc worker] start"] [uuid=61ec56596400033]
[2023/04/24 11:40:39.274 +08:00] [INFO] [coprocessor.go:1080] ["[TIME_COP_PROCESS] resp_time:392.42167ms txnStartTS:441006853671354462 region_id:773785117 store_addr:10.33.32.52:20199 kv_process_ms:347 kv_wait_ms:0 kv_read_ms:319 processed_versions:266995 total_versions:375693 rocksdb_delete_skipped_count:995 rocksdb_key_skipped_count:367555 rocksdb_cache_hit_count:1044 rocksdb_read_count:292 rocksdb_read_byte:4067175"]
[2023/04/24 11:40:39.718 +08:00] [INFO] [data_slow_query.go:157] ["Telemetry slow query stats, postReportSlowQueryStats finished"]
[2023/04/24 11:40:39.719 +08:00] [INFO] [telemetry.go:136] ["Uploading telemetry data to https://telemetry.pingcap.com/api/v1/tidb/report"]
[2023/04/24 11:40:39.774 +08:00] [INFO] [manager.go:151] ["start campaign owner"] [ownerInfo="[telemetry] /tidb/telemetry/owner"]
[2023/04/24 11:40:40.404 +08:00] [INFO] [info.go:1080] [SetTiFlashGroupConfig]
[2023/04/24 11:41:08.411 +08:00] [INFO] [job_manager.go:265] ["scale ttl worker"] [ttl-worker=manager] [originalCount=0] [newCount=4]
[2023/04/24 11:41:08.411 +08:00] [INFO] [job_manager.go:265] ["scale ttl worker"] [ttl-worker=manager] [originalCount=0] [newCount=4]
现在发现两个集群从6.1.0 升级到6.5.0 发现tidb-server 卡住的问题,端口无法监听。