tikv启动时出现oom

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:

【概述】场景+问题概述
tikv启动是时候出现oom,136.25.60.11:20161 节点oom出现前打印大量
[INFO] [raft.rs:2443] [“switched to configuration”]

【背景】做过哪些操作
测试大批量数据导入v性能,平凡修改配置参数,重启tikv,pd-clt添加删除各种scheduleer ,kill -9 等等
在tikv起不来之前的测试中,会出现卡顿,长时间导入0条的现象,tikv进程在但是并不怎么打日志,出现长时间server is busy后,失去耐心直接kill -9了问题的tikv后就起不来了

【现象】业务和数据库现象
kill -9 后业务能继续, 但是重新测试时发现region分布不均匀,有一个strore明显磁盘消耗高
日志观察
tikv123.zip (48.6 KB)

【业务影响】
tiup cluster scale-in 命令想删除故障节点一直显示pending offline, 没有办法强行删除

【TiDB 版本】
v5.0.0

【附件】

  1. TiUP Cluster Display 信息
    Starting component cluster: /pingcap/.tiup/components/cluster/v1.4.0/tiup-cluster display tidb-test
    Cluster type: tidb
    Cluster name: tidb-test
    Cluster version: v5.0.0
    SSH type: builtin
    Dashboard URL: http://136.25.60.18:2379/dashboard
    ID Role Host Ports OS/Arch Status Data Dir Deploy Dir

136.25.60.17:9093 alertmanager 136.25.60.17 9093/9094 linux/aarch64 Up /pingcap/tidb-data/alertmanager-9093 /pingcap/tidb-deploy/alertmanager-9093
136.25.60.17:3000 grafana 136.25.60.17 3000 linux/aarch64 Up - /pingcap/tidb-deploy/grafana-3000
136.25.60.10:2379 pd 136.25.60.10 2379/2380 linux/aarch64 Up /pingcap/tidb-data/pd-2379 /pingcap/tidb-deploy/pd-2379
136.25.60.11:2379 pd 136.25.60.11 2379/2380 linux/aarch64 Up /pingcap/tidb-data/pd-2379 /pingcap/tidb-deploy/pd-2379
136.25.60.18:2379 pd 136.25.60.18 2379/2380 linux/aarch64 Up|L|UI /pingcap/tidb-data/pd-2379 /pingcap/tidb-deploy/pd-2379
136.25.60.17:9090 prometheus 136.25.60.17 9090 linux/aarch64 Up /pingcap/tidb-data/prometheus-8249 /pingcap/tidb-deploy/prometheus-8249
136.25.60.10:4000 tidb 136.25.60.10 4000/10080 linux/aarch64 Up - /pingcap/tidb-deploy/tidb-4000
136.25.60.11:4000 tidb 136.25.60.11 4000/10080 linux/aarch64 Up - /pingcap/tidb-deploy/tidb-4000
136.25.60.18:4000 tidb 136.25.60.18 4000/10080 linux/aarch64 Up - /pingcap/tidb-deploy/tidb-4000
136.25.60.10:20160 tikv 136.25.60.10 20160/20180 linux/aarch64 Up /pingcap/tidb-data/tikv-20160 /pingcap/tidb-deploy/tikv-20160
136.25.60.10:20161 tikv 136.25.60.10 20161/20181 linux/aarch64 Up /pingcap/tidb-data/tikv-20161 /pingcap/tidb-deploy/tikv-20161
136.25.60.11:20160 tikv 136.25.60.11 20160/20180 linux/aarch64 Up /pingcap/tidb-data/tikv-20160 /pingcap/tidb-deploy/tikv-20160
136.25.60.11:20161 tikv 136.25.60.11 20161/20181 linux/aarch64 Pending Offline /pingcap/tidb-data/tikv-20161 /pingcap/tidb-deploy/tikv-20161
136.25.60.18:20160 tikv 136.25.60.18 20160/20180 linux/aarch64 Up /pingcap/tidb-data/tikv-20160 /pingcap/tidb-deploy/tikv-20160
136.25.60.18:20161 tikv 136.25.60.18 20161/20181 linux/aarch64 Up /pingcap/tidb-data/tikv-20161 /pingcap/tidb-deploy/tikv-20161
Total nodes: 15

  1. TiUP Cluster Edit Config 信息

global:
user: tidb
ssh_port: 22
ssh_type: builtin
deploy_dir: /pingcap/tidb-deploy
data_dir: /pingcap/tidb-data
os: linux
arch: arm64
monitored:
node_exporter_port: 9100
blackbox_exporter_port: 9115
deploy_dir: /pingcap/tidb-deploy/monitored-9100
data_dir: /pingcap/tidb-data/monitored-9100
log_dir: /pingcap/tidb-deploy/monitored-9100/log
server_configs:
tidb:
log.level: info
log.slow-threshold: 3000
performance.max-procs: 16
performance.txn-total-size-limit: 10737418240
prepared-plan-cache.enabled: true
proxy-protocol.networks: 136.25.60.14
tikv:
coprocessor.split-region-on-table: false
pessimistic-txn.pipelined: true
raftdb.allow-concurrent-memtable-write: true
raftdb.max-background-jobs: 6
raftstore.apply-pool-size: 3
raftstore.capacity: 6TB
raftstore.hibernate-regions: false
raftstore.notify-capacity: 409600
raftstore.peer-stale-state-check-interval: 450s
raftstore.store-pool-size: 3
readpool.coprocessor.use-unified-pool: true
readpool.storage.normal-concurrency: 10
readpool.storage.use-unified-pool: true
readpool.unified.max-thread-count: 20
readpool.unified.min-thread-count: 5
rocksdb.compaction-readahead-size: 8MB
rocksdb.defaultcf.bloom-filter-bits-per-key: 20
rocksdb.defaultcf.hard-pending-compaction-bytes-limit: 512GB
rocksdb.defaultcf.level0-file-num-compaction-trigger: 2
rocksdb.defaultcf.max-bytes-for-level-base: 96MB
rocksdb.defaultcf.max-write-buffer-number: 16
rocksdb.defaultcf.optimize-filters-for-hits: false
rocksdb.defaultcf.target-file-size-base: 16MB
rocksdb.defaultcf.write-buffer-size: 256MB
rocksdb.max-background-jobs: 8
rocksdb.max-sub-compactions: 4
rocksdb.max-total-wal-size: 64GB
rocksdb.writecf.block-size: 512KB
rocksdb.writecf.bloom-filter-bits-per-key: 20
rocksdb.writecf.compression-per-level:
- “no”
- “no”
- zstd
- zstd
- zstd
- zstd
- zstd
rocksdb.writecf.hard-pending-compaction-bytes-limit: 512GB
rocksdb.writecf.level0-file-num-compaction-trigger: 2
rocksdb.writecf.max-bytes-for-level-base: 96MB
rocksdb.writecf.max-write-buffer-number: 16
rocksdb.writecf.min-write-buffer-number-to-merge: 2
rocksdb.writecf.target-file-size-base: 16MB
rocksdb.writecf.write-buffer-size: 256MB
server.grpc-concurrency: 8
server.grpc-raft-conn-num: 24
storage.block-cache.capacity: 6GB
storage.scheduler-concurrency: 4096000
storage.scheduler-worker-pool-size: 6
pd:
replication.enable-placement-rules: true
replication.isolation-level: host
replication.location-labels:
- host
- disk
schedule.enable-cross-table-merge: true
schedule.leader-schedule-limit: 4
schedule.leader-schedule-policy: size
schedule.region-schedule-limit: 2048
schedule.replica-schedule-limit: 64
tiflash: {}
tiflash-learner: {}
pump: {}
drainer: {}
cdc: {}
tidb_servers:

  • host: 136.25.60.10
    ssh_port: 22
    port: 4000
    status_port: 10080
    deploy_dir: /pingcap/tidb-deploy/tidb-4000
    log_dir: /pingcap/tidb-deploy/tidb-4000/log
    numa_node: “0”
    arch: arm64
    os: linux
  • host: 136.25.60.11
    ssh_port: 22
    port: 4000
    status_port: 10080
    deploy_dir: /pingcap/tidb-deploy/tidb-4000
    log_dir: /pingcap/tidb-deploy/tidb-4000/log
    numa_node: “0”
    arch: arm64
    os: linux
  • host: 136.25.60.18
    ssh_port: 22
    port: 4000
    status_port: 10080
    deploy_dir: /pingcap/tidb-deploy/tidb-4000
    log_dir: /pingcap/tidb-deploy/tidb-4000/log
    numa_node: “0”
    arch: arm64
    os: linux
    tikv_servers:
  • host: 136.25.60.10
    ssh_port: 22
    port: 20160
    status_port: 20180
    deploy_dir: /pingcap/tidb-deploy/tikv-20160
    data_dir: /pingcap/tidb-data/tikv-20160
    log_dir: /pingcap/tidb-deploy/tikv-20160/log
    numa_node: “1”
    config:
    raftstore.raftdb-path: /pingcap/tidb-data/raftstore-data/tikv-20160/raft
    rocksdb.wal-dir: /pingcap/tidb-data/wal-20160
    server.labels:
    disk: sdf
    host: tikv1
    storage.data-dir: /pingcap/tidb-data/tikv-20160
    arch: arm64
    os: linux
  • host: 136.25.60.10
    ssh_port: 22
    port: 20161
    status_port: 20181
    deploy_dir: /pingcap/tidb-deploy/tikv-20161
    data_dir: /pingcap/tidb-data/tikv-20161
    log_dir: /pingcap/tidb-deploy/tikv-20161/log
    numa_node: “2”
    config:
    raftstore.raftdb-path: /pingcap/tidb-data/raftstore-data/tikv-20161/raft
    rocksdb.wal-dir: /pingcap/tidb-data/wal-20161
    server.labels:
    disk: sde
    host: tikv1
    storage.data-dir: /pingcap/tidb-data/tikv-20161
    arch: arm64
    os: linux
  • host: 136.25.60.11
    ssh_port: 22
    port: 20160
    status_port: 20180
    deploy_dir: /pingcap/tidb-deploy/tikv-20160
    data_dir: /pingcap/tidb-data/tikv-20160
    log_dir: /pingcap/tidb-deploy/tikv-20160/log
    numa_node: “1”
    config:
    raftstore.raftdb-path: /pingcap/tidb-data/raftstore-data/tikv-20160/raft
    rocksdb.wal-dir: /pingcap/tidb-data/wal-20160
    server.labels:
    disk: sdf
    host: tikv2
    storage.data-dir: /pingcap/tidb-data/tikv-20160
    arch: arm64
    os: linux
  • host: 136.25.60.11
    ssh_port: 22
    port: 20161
    status_port: 20181
    deploy_dir: /pingcap/tidb-deploy/tikv-20161
    data_dir: /pingcap/tidb-data/tikv-20161
    log_dir: /pingcap/tidb-deploy/tikv-20161/log
    numa_node: “2”
    config:
    raftstore.raftdb-path: /pingcap/tidb-data/raftstore-data/tikv-20161/raft
    rocksdb.wal-dir: /pingcap/tidb-data/wal-20161
    server.labels:
    disk: sde
    host: tikv2
    storage.data-dir: /pingcap/tidb-data/tikv-20161
    arch: arm64
    os: linux
  • host: 136.25.60.18
    ssh_port: 22
    port: 20160
    status_port: 20180
    deploy_dir: /pingcap/tidb-deploy/tikv-20160
    data_dir: /pingcap/tidb-data/tikv-20160
    log_dir: /pingcap/tidb-deploy/tikv-20160/log
    numa_node: “1”
    config:
    raftstore.raftdb-path: /pingcap/tidb-data/raftstore-data/tikv-20160/raft
    rocksdb.wal-dir: /pingcap/tidb-data/wal-20160
    server.labels:
    disk: sdf
    host: tikv3
    storage.data-dir: /pingcap/tidb-data/tikv-20160
    arch: arm64
    os: linux
  • host: 136.25.60.18
    ssh_port: 22
    port: 20161
    status_port: 20181
    deploy_dir: /pingcap/tidb-deploy/tikv-20161
    data_dir: /pingcap/tidb-data/tikv-20161
    log_dir: /pingcap/tidb-deploy/tikv-20161/log
    numa_node: “2”
    config:
    raftstore.raftdb-path: /pingcap/tidb-data/raftstore-data/tikv-20161/raft
    rocksdb.wal-dir: /pingcap/tidb-data/wal-20161
    server.labels:
    disk: sde
    host: tikv3
    storage.data-dir: /pingcap/tidb-data/tikv-20161
    arch: arm64
    os: linux
    tiflash_servers: []
    pd_servers:
  • host: 136.25.60.10
    ssh_port: 22
    name: pd-136.25.60.10-2379
    client_port: 2379
    peer_port: 2380
    deploy_dir: /pingcap/tidb-deploy/pd-2379
    data_dir: /pingcap/tidb-data/pd-2379
    arch: arm64
    os: linux
  • host: 136.25.60.11
    ssh_port: 22
    name: pd-136.25.60.11-2379
    client_port: 2379
    peer_port: 2380
    deploy_dir: /pingcap/tidb-deploy/pd-2379
    data_dir: /pingcap/tidb-data/pd-2379
    arch: arm64
    os: linux
  • host: 136.25.60.18
    ssh_port: 22
    name: pd-136.25.60.18-2379
    client_port: 2379
    peer_port: 2380
    deploy_dir: /pingcap/tidb-deploy/pd-2379
    data_dir: /pingcap/tidb-data/pd-2379
    arch: arm64
    os: linux
    monitoring_servers:
  • host: 136.25.60.17
    ssh_port: 22
    port: 9090
    deploy_dir: /pingcap/tidb-deploy/prometheus-8249
    data_dir: /pingcap/tidb-data/prometheus-8249
    log_dir: /pingcap/tidb-deploy/prometheus-8249/log
    external_alertmanagers: []
    arch: arm64
    os: linux
    grafana_servers:
  • host: 136.25.60.17
    ssh_port: 22
    port: 3000
    deploy_dir: /pingcap/tidb-deploy/grafana-3000
    arch: arm64
    os: linux
    username: admin
    password: admin
    anonymous_enable: false
    root_url: “”
    domain: “”
    alertmanager_servers:
  • host: 136.25.60.17
    ssh_port: 22
    web_port: 9093
    cluster_port: 9094
    deploy_dir: /pingcap/tidb-deploy/alertmanager-9093
    data_dir: /pingcap/tidb-data/alertmanager-9093
    log_dir: /pingcap/tidb-deploy/alertmanager-9093/log
    arch: arm64
    os: linux
  1. TiDB- Overview 监控

tidb-test-TiKV-Trouble-Shooting_2021-06-28T06_33_48.010Z.zip (3.0 MB)

  • 对应模块日志(包含问题前后1小时日志)

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

其他tikv节点也有出现switched to configuration的大量日志,
是不是已经被我调乱没法恢复,要把整个集群铲掉重建?

Jun 28 00:28:52 ceph2 systemd: Started tikv service.
Jun 28 00:28:52 ceph2 run_tikv.sh: sync …
Jun 28 00:28:52 ceph2 run_tikv.sh: real#0110m0.008s
Jun 28 00:28:52 ceph2 run_tikv.sh: user#0110m0.002s
Jun 28 00:28:52 ceph2 run_tikv.sh: sys#0110m0.001s
Jun 28 00:28:52 ceph2 run_tikv.sh: ok
Jun 28 00:29:07 ceph2 kernel: raftstore-7-1 invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=2, order=0, oom_score_adj=0
Jun 28 00:29:07 ceph2 kernel: raftstore-7-1 cpuset=/ mems_allowed=0-3
Jun 28 00:29:07 ceph2 kernel: CPU: 62 PID: 3755322 Comm: raftstore-7-1 Kdump: loaded Tainted: G W ------------ 4.14.0-115.el7a.0.1.aarch64 #1
Jun 28 00:29:08 ceph2 kernel: Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDDA, BIOS 1.25 01/17/2020
Jun 28 00:29:08 ceph2 kernel: Call trace:
Jun 28 00:29:08 ceph2 kernel: [] dump_backtrace+0x0/0x23c
Jun 28 00:29:08 ceph2 kernel: [] show_stack+0x24/0x2c
Jun 28 00:29:08 ceph2 kernel: [] dump_stack+0x84/0xa8
Jun 28 00:29:08 ceph2 kernel: [] dump_header+0x94/0x1ec
Jun 28 00:29:08 ceph2 kernel: [] oom_kill_process+0x2b8/0x524
Jun 28 00:29:08 ceph2 kernel: [] out_of_memory+0xfc/0x484
Jun 28 00:29:08 ceph2 kernel: [] __alloc_pages_nodemask+0xa78/0xec0
Jun 28 00:29:08 ceph2 kernel: [] alloc_pages_current+0x8c/0xd8
Jun 28 00:29:08 ceph2 kernel: [] __page_cache_alloc+0x9c/0xd8
Jun 28 00:29:08 ceph2 kernel: [] generic_file_buffered_read+0x51c/0x768
Jun 28 00:29:08 ceph2 kernel: [] generic_file_read_iter+0x11c/0x16c
Jun 28 00:29:08 ceph2 kernel: [] ext4_file_read_iter+0x58/0x108 [ext4]
Jun 28 00:29:08 ceph2 kernel: [] __vfs_read+0x110/0x178
Jun 28 00:29:08 ceph2 kernel: [] vfs_read+0x90/0x14c
Jun 28 00:29:08 ceph2 kernel: [] SyS_pread64+0xb0/0xc8
Jun 28 00:29:08 ceph2 kernel: Exception stack(0xffff000047defec0 to 0xffff000047df0000)
Jun 28 00:29:08 ceph2 kernel: fec0: 00000000000006d9 0000ffe658d1f000 00000000000043e1 000000000028c427
Jun 28 00:29:08 ceph2 kernel: fee0: 0000fff351c70320 0000fff351c77780 00000000ffffffbb 0000000000000000
Jun 28 00:29:08 ceph2 kernel: ff00: 0000000000000043 00000000987ea251 0000000000000000 ffffffffffffffff
Jun 28 00:29:08 ceph2 kernel: ff20: 0000000000000018 00000003e8000000 00379fa1a1da156e 0000acd460002968
Jun 28 00:29:08 ceph2 kernel: ff40: 0000000000000000 0000ffff976e0220 0000000000000041 0000fff351c705a0
Jun 28 00:29:08 ceph2 kernel: ff60: 0000ffe177147710 00000000000043e1 000000000028c427 0000ffe658d1f000
Jun 28 00:29:08 ceph2 kernel: ff80: 0000fff351c704b0 00000000000043e1 0000aaaaeb040ae0 0000fff351c705e0
Jun 28 00:29:08 ceph2 kernel: ffa0: 0000aaaaeb040ae8 0000fff351c70370 0000ffff976e023c 0000fff351c70330
Jun 28 00:29:08 ceph2 kernel: ffc0: 0000ffff976e0250 0000000080000000 00000000000006d9 0000000000000043
Jun 28 00:29:08 ceph2 kernel: ffe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Jun 28 00:29:08 ceph2 kernel: [] __sys_trace_return+0x0/0x4
Jun 28 00:29:08 ceph2 kernel: Mem-Info:
Jun 28 00:29:08 ceph2 kernel: active_anon:613243 inactive_anon:6656 isolated_anon:0#012 active_file:371312 inactive_file:579948 isolated_file:0#012 unevictable:0 dirty:0 writeback:0 unstable:0#012 slab_reclaimable:33330 slab_unreclaimable:19301#012 mapped:5428 shmem:9317 pagetables:926 bounce:0#012 free:1388570 free_pcp:5 free_cma:0
Jun 28 00:29:08 ceph2 kernel: Node 2 active_anon:25322496kB inactive_anon:3200kB active_file:14528kB inactive_file:10880kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:448kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Jun 28 00:29:08 ceph2 kernel: Node 2 Normal free:1669248kB min:1672000kB low:2089984kB high:2507968kB active_anon:25319552kB inactive_anon:3200kB active_file:16832kB inactive_file:2880kB unevictable:0kB writepending:3648kB present:33554432kB managed:33516864kB mlocked:0kB kernel_stack:42112kB pagetables:25536kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jun 28 00:29:08 ceph2 kernel: lowmem_reserve[]: 0 0 0
Jun 28 00:29:08 ceph2 kernel: Node 2 Normal: 160764kB (UME) 1161128kB (UME) 670256kB (UME) 294512kB (UME) 1071024kB (UME) 292048kB (UME) 204096kB (UME) 58192kB (UE) 716384kB (U) 132768kB (M) 065536kB 1131072kB (M) 2262144kB (ME) 0524288kB = 1668160kB
Jun 28 00:29:08 ceph2 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jun 28 00:29:08 ceph2 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=524288kB
Jun 28 00:29:08 ceph2 kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jun 28 00:29:08 ceph2 kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=524288kB
Jun 28 00:29:08 ceph2 kernel: Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jun 28 00:29:08 ceph2 kernel: Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=524288kB
Jun 28 00:29:08 ceph2 kernel: Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jun 28 00:29:08 ceph2 kernel: Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=524288kB
Jun 28 00:29:08 ceph2 kernel: 959819 total pagecache pages
Jun 28 00:29:08 ceph2 kernel: 0 pages in swap cache
Jun 28 00:29:08 ceph2 kernel: Swap cache stats: add 4860206, delete 4860807, find 1576215/2758679
Jun 28 00:29:08 ceph2 kernel: Free swap = 0kB
Jun 28 00:29:08 ceph2 kernel: Total swap = 0kB
Jun 28 00:29:08 ceph2 kernel: 3145661 pages RAM
Jun 28 00:29:08 ceph2 kernel: 0 pages HighMem/MovableOnly
Jun 28 00:29:08 ceph2 kernel: 17443 pages reserved
Jun 28 00:29:08 ceph2 kernel: 0 pages hwpoisoned
Jun 28 00:29:08 ceph2 kernel: [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
Jun 28 00:29:08 ceph2 kernel: [ 9521] 0 9521 1472 1251 4 2 0 0 systemd-journal
Jun 28 00:29:08 ceph2 kernel: [ 9547] 0 9547 5894 79 5 2 0 0 lvmetad
Jun 28 00:29:08 ceph2 kernel: [ 9560] 0 9560 254 92 4 2 0 -1000 systemd-udevd
Jun 28 00:29:08 ceph2 kernel: [13563] 0 13563 136 77 4 2 0 0 rdma-ndd
Jun 28 00:29:08 ceph2 kernel: [13843] 0 13843 302 76 4 2 5 -1000 auditd
Jun 28 00:29:08 ceph2 kernel: [13845] 0 13845 1204 39 3 2 0 0 audispd
Jun 28 00:29:08 ceph2 kernel: [13848] 0 13848 151 80 3 2 0 0 sedispatch
Jun 28 00:29:08 ceph2 kernel: [13872] 0 13872 103 48 4 2 5 0 smartd
Jun 28 00:29:08 ceph2 kernel: [13873] 0 13873 296 152 3 2 0 0 rngd
Jun 28 00:29:08 ceph2 kernel: [13874] 0 13874 5362 173 5 2 0 0 accounts-daemon
Jun 28 00:29:08 ceph2 kernel: [13875] 32 13875 216 117 3 2 0 0 rpcbind
Jun 28 00:29:08 ceph2 kernel: [13877] 0 13877 6201 303 5 2 3 0 udisksd
Jun 28 00:29:08 ceph2 kernel: [13887] 0 13887 113 97 4 2 0 0 systemd-logind
Jun 28 00:29:08 ceph2 kernel: [13906] 0 13906 2067 200 4 2 0 0 abrtd
Jun 28 00:29:08 ceph2 kernel: [13913] 0 13913 2058 187 3 2 0 0 abrt-watch-log
Jun 28 00:29:08 ceph2 kernel: [13920] 999 13920 10255 378 5 3 0 0 polkitd
Jun 28 00:29:08 ceph2 kernel: [13934] 172 13934 2448 91 4 2 0 0 rtkit-daemon
Jun 28 00:29:08 ceph2 kernel: [13942] 81 13942 188 131 4 2 0 -900 dbus-daemon
Jun 28 00:29:08 ceph2 kernel: [13956] 70 13956 166 111 3 2 0 0 avahi-daemon
Jun 28 00:29:08 ceph2 kernel: [13958] 0 13958 2058 187 4 2 0 0 abrt-watch-log
Jun 28 00:29:08 ceph2 kernel: [13961] 998 13961 46 26 4 2 0 0 lsmd
Jun 28 00:29:08 ceph2 kernel: [13965] 0 13965 4986 272 4 2 0 0 ModemManager
Jun 28 00:29:08 ceph2 kernel: [13972] 0 13972 123 89 3 2 0 0 irqbalance
Jun 28 00:29:08 ceph2 kernel: [13979] 0 13979 1879 180 4 2 0 0 ceph-crash
Jun 28 00:29:08 ceph2 kernel: [13986] 70 13986 163 59 3 2 0 0 avahi-daemon
Jun 28 00:29:08 ceph2 kernel: [13991] 994 13991 81 58 4 2 0 0 chronyd
Jun 28 00:29:08 ceph2 kernel: [16049] 0 16049 6892 479 4 2 0 0 tuned
Jun 28 00:29:08 ceph2 kernel: [16054] 0 16054 1947 153 3 2 0 0 cupsd
Jun 28 00:29:08 ceph2 kernel: [16055] 0 16055 321 204 4 2 0 -1000 sshd
Jun 28 00:29:08 ceph2 kernel: [16065] 0 16065 8384 994 5 2 0 0 rsyslogd
Jun 28 00:29:08 ceph2 kernel: [16114] 0 16114 78 25 3 2 25 0 atd
Jun 28 00:29:08 ceph2 kernel: [16117] 0 16117 6542 159 4 2 0 0 gdm
Jun 28 00:29:08 ceph2 kernel: [16119] 0 16119 1718 28 3 2 0 0 agetty
Jun 28 00:29:08 ceph2 kernel: [16125] 0 16125 1756 63 4 2 0 0 crond
Jun 28 00:29:08 ceph2 kernel: [16232] 0 16232 4571 470 5 2 0 0 X
Jun 28 00:29:08 ceph2 kernel: [17912] 0 17912 4279 187 4 2 2 0 gdm-session-wor
Jun 28 00:29:08 ceph2 kernel: [18355] 42 18355 7652 373 4 3 0 0 gnome-session-b
Jun 28 00:29:08 ceph2 kernel: [18877] 42 18877 172 71 3 3 0 0 dbus-launch
Jun 28 00:29:08 ceph2 kernel: [19327] 42 19327 166 93 4 3 0 0 dbus-daemon
Jun 28 00:29:08 ceph2 kernel: [22683] 0 22683 345 138 3 2 0 0 master
Jun 28 00:29:08 ceph2 kernel: [22705] 89 22705 348 154 4 2 0 0 qmgr
Jun 28 00:29:08 ceph2 kernel: [22748] 42 22748 4425 111 4 3 0 0 at-spi-bus-laun
Jun 28 00:29:08 ceph2 kernel: [22764] 42 22764 160 104 4 3 0 0 dbus-daemon
Jun 28 00:29:08 ceph2 kernel: [22767] 42 22767 2386 151 4 3 0 0 at-spi2-registr
Jun 28 00:29:08 ceph2 kernel: [22942] 42 22942 144769 4419 21 4 0 0 gnome-shell
Jun 28 00:29:08 ceph2 kernel: [23007] 0 23007 5594 206 5 2 0 0 upowerd
Jun 28 00:29:08 ceph2 kernel: [23252] 42 23252 16423 224 6 3 0 0 pulseaudio
Jun 28 00:29:08 ceph2 kernel: [23570] 42 23570 6108 151 3 3 0 0 ibus-daemon
Jun 28 00:29:08 ceph2 kernel: [23575] 42 23575 5030 107 5 3 0 0 ibus-dconf
Jun 28 00:29:08 ceph2 kernel: [23577] 42 23577 4534 487 3 3 0 0 ibus-x11
Jun 28 00:29:08 ceph2 kernel: [23580] 42 23580 5026 104 4 3 0 0 ibus-portal
Jun 28 00:29:08 ceph2 kernel: [23621] 42 23621 4999 79 4 3 0 0 xdg-permission-
Jun 28 00:29:08 ceph2 kernel: [23686] 0 23686 5362 164 4 2 0 0 boltd
Jun 28 00:29:08 ceph2 kernel: [23688] 0 23688 262 118 4 2 0 0 wpa_supplicant
Jun 28 00:29:08 ceph2 kernel: [23698] 0 23698 5402 196 4 2 0 0 packagekitd
Jun 28 00:29:08 ceph2 kernel: [23855] 42 23855 6700 544 5 3 0 0 gsd-xsettings
Jun 28 00:29:08 ceph2 kernel: [23857] 42 23857 5033 101 5 3 0 0 gsd-a11y-settin
Jun 28 00:29:08 ceph2 kernel: [23858] 42 23858 4529 473 3 3 0 0 gsd-clipboard
Jun 28 00:29:08 ceph2 kernel: [23859] 42 23859 8486 2173 5 3 0 0 gsd-color
Jun 28 00:29:08 ceph2 kernel: [23862] 42 23862 4570 334 5 3 0 0 gsd-datetime
Jun 28 00:29:08 ceph2 kernel: [23863] 42 23863 5033 99 5 3 0 0 gsd-housekeepin
Jun 28 00:29:08 ceph2 kernel: [23864] 42 23864 6652 490 5 3 0 0 gsd-keyboard
Jun 28 00:29:08 ceph2 kernel: [23865] 42 23865 13008 603 6 3 0 0 gsd-media-keys
Jun 28 00:29:08 ceph2 kernel: [23869] 42 23869 3965 82 4 3 0 0 gsd-mouse
Jun 28 00:29:09 ceph2 kernel: [23870] 42 23870 6689 561 3 3 0 0 gsd-power
Jun 28 00:29:09 ceph2 kernel: [23871] 42 23871 4134 175 3 3 0 0 gsd-print-notif
Jun 28 00:29:09 ceph2 kernel: [23874] 42 23874 3965 82 4 3 0 0 gsd-rfkill
Jun 28 00:29:09 ceph2 kernel: [23875] 42 23875 5022 91 3 3 0 0 gsd-screensaver
Jun 28 00:29:09 ceph2 kernel: [23876] 42 23876 5129 148 5 3 0 0 gsd-sharing
Jun 28 00:29:09 ceph2 kernel: [23881] 42 23881 6164 139 4 3 0 0 gsd-smartcard
Jun 28 00:29:09 ceph2 kernel: [23889] 42 23889 5251 212 4 3 0 0 gsd-sound
Jun 28 00:29:09 ceph2 kernel: [23891] 42 23891 5608 499 5 3 0 0 gsd-wacom
Jun 28 00:29:09 ceph2 kernel: [23950] 997 23950 5429 188 4 2 0 0 colord
Jun 28 00:29:09 ceph2 kernel: [24041] 42 24041 3971 104 4 3 0 0 ibus-engine-sim
Jun 28 00:29:09 ceph2 kernel: [3580772] 0 3580772 121 64 4 3 0 0 nmon
Jun 28 00:29:09 ceph2 kernel: [3715014] 2020 3715014 238560 5360 17 2 0 0 pd-server
Jun 28 00:29:09 ceph2 kernel: [3715143] 2020 3715143 4250 2644 4 2 0 0 node_exporter
Jun 28 00:29:09 ceph2 kernel: [3715144] 2020 3715144 1737 13 5 2 0 0 run_node_export
Jun 28 00:29:09 ceph2 kernel: [3715145] 2020 3715145 1715 31 3 2 0 0 tee
Jun 28 00:29:09 ceph2 kernel: [3715215] 2020 3715215 522 353 6 2 0 0 blackbox_export
Jun 28 00:29:09 ceph2 kernel: [3715216] 2020 3715216 1737 13 3 2 0 0 run_blackbox_ex
Jun 28 00:29:09 ceph2 systemd: Stopping tikv service…
Jun 28 00:29:09 ceph2 kernel: [3715217] 2020 3715217 1715 31 3 2 0 0 tee
Jun 28 00:29:09 ceph2 kernel: [3739623] 0 3739623 408 306 4 2 0 0 sshd
Jun 28 00:29:09 ceph2 kernel: [3739635] 2020 3739635 408 203 4 2 0 0 sshd
Jun 28 00:29:09 ceph2 kernel: [3740915] 2020 3740915 1760 81 4 3 0 0 bash
Jun 28 00:29:09 ceph2 kernel: [3750103] 89 3750103 347 171 4 2 0 0 pickup
Jun 28 00:29:09 ceph2 kernel: [3750308] 0 3750308 2139 307 4 2 0 0 sudo
Jun 28 00:29:09 ceph2 kernel: [3750309] 0 3750309 1760 83 4 3 0 0 bash
Jun 28 00:29:09 ceph2 kernel: [3753712] 0 3753712 408 305 4 2 0 0 sshd
Jun 28 00:29:09 ceph2 kernel: [3753716] 2020 3753716 408 202 4 2 0 0 sshd
Jun 28 00:29:09 ceph2 kernel: [3753718] 2020 3753718 1760 82 5 3 0 0 bash
Jun 28 00:29:09 ceph2 kernel: [3754003] 0 3754003 2139 307 4 2 0 0 sudo
Jun 28 00:29:09 ceph2 kernel: [3754004] 0 3754004 1760 81 3 3 0 0 bash
Jun 28 00:29:09 ceph2 kernel: [3755203] 2020 3755203 2031126 392471 250 2 0 0 tikv-server
Jun 28 00:29:09 ceph2 kernel: [3755362] 0 3755362 1778 74 4 3 0 0 systemctl
Jun 28 00:29:09 ceph2 kernel: [3755363] 0 3755363 63 45 4 3 0 0 systemd-tty-ask
Jun 28 00:29:09 ceph2 kernel: [3755364] 0 3755364 3397 140 4 3 0 0 pkttyagent
Jun 28 00:29:09 ceph2 kernel: Out of memory: Kill process 3755203 (tikv-server) score 748 or sacrifice child
Jun 28 00:29:09 ceph2 kernel: Killed process 3755203 (tikv-server) total-vm:129992064kB, anon-rss:25093120kB, file-rss:25024kB, shmem-rss:0kB
Jun 28 00:29:09 ceph2 kernel: oom_reaper: reaped process 3755203 (tikv-server), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Jun 28 00:29:09 ceph2 systemd: tikv-20161.service: main process exited, code=killed, status=9/KILL

请先自己尝试通过各种手段搜索答案,包括但不限于:

先试一下搜索看看有没有相关问题的答案~

我在创建了
/pingcap/tidb-data/tikv-20161/raft-engine 目录后就奇迹般的不会oom了,现在终于从pending offline 到tombstone了

但是看了tikv下的配置,并没有启开线性存储的raft-engine来代替raftdb
看tikv日志是向pd注册后就直接tombstone了,也没有打印switch config的日志

[raft-engine]
enable = false
dir = “/pingcap/tidb-data/tikv-20161/raft-engine”
recovery-mode = “tolerate-corrupted-tail-records”
bytes-per-sync = “256KiB”
target-file-size = “128MiB”
purge-threshold = “10GiB”
cache-limit = “1GiB”

[raftstore]
prevote = true
raftdb-path = “/pingcap/tidb-data/raftstore-data/tikv-20161/raft” //这个是raftdb的路径

  1. 136.25.60.11:20160 频繁 OOM 吗? 再次之前已经执行了 136.25.60.11:20161 scale-in ?
  2. 麻烦反馈下这段时间的 over-view 和 detail-tikv 监控视图,具体什么时间段频繁 OOM
    [FAQ] Grafana Metrics 页面的导出和导入
  3. 或者反馈下 dmesg 信息,多谢。

已经不oom成功tombstone无法重现了,看日志像是raftstore线程一次搞了大量的switch config,估计和我改配置强杀有关

好的,复现后麻烦在反馈信息,多谢。

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。