为什么这个报错反复出现?[ERROR] [server.rs:866] ["failed to init io snooper"] [err_code=KV:Unknown] [err="\"IO snooper is n ot started due to not compiling with BCC\""]

前面3个tikv中的其中一个节点,突然down掉,起不来,日志报错入标题;经过一顿扩缩容操作后,新扩容了一个新的节点, 老的节点下线了; 新的节点又突然down掉,报如上错,到底哪里出了问题??

[2021/06/15 16:39:21.933 +08:00] [INFO] [server.rs:270] [“using config”] [config="{“log-level”:“info”,“log-file”:"/data/tidb-
deploy/tikv-20160/log/tikv.log",“log-format”:“text”,“slow-log-file”:"",“slow-log-threshold”:“1s”,“log-rotation-timespa
n”:“1d”,“log-rotation-size”:“300MiB”,“panic-when-unexpected-key-or-data”:false,“enable-io-snoop”:true,“abort-on-panic”:
false,“readpool”:{“unified”:{“min-thread-count”:1,“max-thread-count”:4,“stack-size”:“10MiB”,“max-tasks-per-worker”:200
0},“storage”:{“use-unified-pool”:true,“high-concurrency”:4,“normal-concurrency”:4,“low-concurrency”:4,“max-tasks-per-work
er-high”:2000,“max-tasks-per-worker-normal”:2000,“max-tasks-per-worker-low”:2000,“stack-size”:“10MiB”},“coprocessor”:{“u
se-unified-pool”:true,“high-concurrency”:3,“normal-concurrency”:3,“low-concurrency”:3,“max-tasks-per-worker-high”:2000,“ma
x-tasks-per-worker-normal”:2000,“max-tasks-per-worker-low”:2000,“stack-size”:“10MiB”}},“server”:{“addr”:“0.0.0.0:20160”
,“advertise-addr”:“192.168.5.49:20160”,“status-addr”:“0.0.0.0:20180”,“advertise-status-addr”:“192.168.5.49:20180”,“stat
us-thread-pool-size”:1,“max-grpc-send-msg-len”:10485760,“grpc-compression-type”:“none”,“grpc-concurrency”:5,“grpc-concurre
nt-stream”:1024,“grpc-raft-conn-num”:1,“grpc-memory-pool-quota”:9223372036854775807,“grpc-stream-initial-window-size”:“2MiB
“,“grpc-keepalive-time”:“10s”,“grpc-keepalive-timeout”:“3s”,“concurrent-send-snap-limit”:32,“concurrent-recv-snap-limit”
:32,“end-point-recursion-limit”:1000,“end-point-stream-channel-size”:8,“end-point-batch-row-limit”:64,“end-point-stream-batch
-row-limit”:128,“end-point-enable-batch-if-possible”:true,“end-point-request-max-handle-duration”:“1m”,“end-point-max-concur
rency”:4,“snap-max-write-bytes-per-sec”:“100MiB”,“snap-max-total-size”:“0KiB”,“stats-concurrency”:1,“heavy-load-threshol
d”:300,“heavy-load-wait-duration”:“1ms”,“enable-request-batch”:true,“background-thread-count”:2,“end-point-slow-log-thresh
old”:“1s”,“forward-max-connections-per-address”:4,“labels”:{}},“storage”:{“data-dir”:”/data/tidb-data/tikv-20160”,“gc-
ratio-threshold”:1.1,“max-key-size”:4096,“scheduler-concurrency”:524288,“scheduler-worker-pool-size”:4,“scheduler-pending-wr
ite-threshold”:“100MiB”,“reserve-space”:“0KiB”,“enable-async-apply-prewrite”:false,“enable-ttl”:false,“ttl-check-poll-in
terval”:“12h”,“block-cache”:{“shared”:true,“capacity”:“1331MiB”,“num-shard-bits”:6,“strict-capacity-limit”:false,“hi
gh-pri-pool-ratio”:0.8,“memory-allocator”:“nodump”}},“pd”:{“endpoints”:[“192.168.5.43:2379”,“192.168.5.44:2379”],“retr
y-interval”:“300ms”,“retry-max-count”:9223372036854775807,“retry-log-every”:10,“update-interval”:“10m”,“enable-forwardin
g”:false},“metric”:{“job”:“tikv”},“raftstore”:{“prevote”:true,“raftdb-path”:"/data/tidb-data/tikv-20160/raft",“capac
ity”:“0KiB”,“raft-base-tick-interval”:“1s”,“raft-heartbeat-ticks”:2,“raft-election-timeout-ticks”:10,“raft-min-election-
timeout-ticks”:10,“raft-max-election-timeout-ticks”:20,“raft-max-size-per-msg”:“1MiB”,“raft-max-inflight-msgs”:256,“raft-e
ntry-max-size”:“8MiB”,“raft-log-gc-tick-interval”:“10s”,“raft-log-gc-threshold”:50,“raft-log-gc-count-limit”:73728,“raft
-log-gc-size-limit”:“72MiB”,“raft-log-reserve-max-ticks”:6,“raft-engine-purge-interval”:“10s”,“raft-entry-cache-life-time
“:“30s”,“raft-reject-transfer-leader-duration”:“3s”,“split-region-check-tick-interval”:“10s”,“region-split-check-diff”:
“6MiB”,“region-compact-check-interval”:“5m”,“region-compact-check-step”:100,“region-compact-min-tombstones”:10000,“region-
compact-tombstones-percent”:30,“pd-heartbeat-tick-interval”:“1m”,“pd-store-heartbeat-tick-interval”:“10s”,“snap-mgr-gc-tic
k-interval”:“1m”,“snap-gc-timeout”:“4h”,“lock-cf-compact-interval”:“10m”,“lock-cf-compact-bytes-threshold”:“256MiB”,
“notify-capacity”:40960,“messages-per-tick”:4096,“max-peer-down-duration”:“5m”,“max-leader-missing-duration”:“2h”,“abnor
mal-leader-missing-duration”:“10m”,“peer-stale-state-check-interval”:“5m”,“leader-transfer-max-log-lag”:128,“snap-apply-ba
tch-size”:“10MiB”,“consistency-check-interval”:“0s”,“report-region-flow-interval”:“1m”,“raft-store-max-leader-lease”:”
9s”,“right-derive-when-split”:true,“allow-remove-leader”:false,“merge-max-log-gap”:10,“merge-check-tick-interval”:“2s”,"
use-delete-range":false,“cleanup-import-sst-interval”:“10m”,“local-read-batch-size”:1024,“apply-max-batch-size”:256,"apply

[2021/06/15 16:39:21.937 +08:00] [ERROR] [server.rs:866] [“failed to init io snooper”] [err_code=KV:Unknown] [err="“IO snooper is n
ot started due to not compiling with BCC”"]
[2021/06/15 16:39:21.937 +08:00] [INFO] [mod.rs:116] [“encryption: none of key dictionary and file dictionary are found.”]
[2021/06/15 16:39:21.937 +08:00] [INFO] [mod.rs:477] [“encryption is disabled.”]
[2021/06/15 16:39:22.007 +08:00] [INFO] [future.rs:146] [“starting working thread”] [worker=gc-worker]
[2021/06/15 16:39:22.071 +08:00] [INFO] [mod.rs:214] [“Storage started.”]
[2021/06/15 16:39:22.080 +08:00] [INFO] [node.rs:176] [“put store to PD”] [store=“id: 123045 address: “192.168.5.49:20160” version
: “5.0.2” status_address: “192.168.5.49:20180” git_hash: “6e6ea0e02c2caac556f95a821f92b28fc88dba85” start_timestamp: 162374636
2 deploy_path: “/data/tidb-deploy/tikv-20160/bin””]
[2021/06/15 16:39:22.083 +08:00] [INFO] [node.rs:243] [“initializing replication mode”] [store_id=123045] [status=Some()]
[2021/06/15 16:39:22.083 +08:00] [INFO] [replication_mode.rs:51] [“associated store labels”] [labels="[]"] [store_id=4]
[2021/06/15 16:39:22.083 +08:00] [INFO] [replication_mode.rs:51] [“associated store labels”] [labels="[key: “host” value: “tikv44
“]”] [store_id=5]
[2021/06/15 16:39:22.083 +08:00] [INFO] [replication_mode.rs:51] [“associated store labels”] [labels=”[]"] [store_id=123045]
[2021/06/15 16:39:22.083 +08:00] [INFO] [replication_mode.rs:51] [“associated store labels”] [labels="[key: “host” value: "tikv45
“]”] [store_id=1]
[2021/06/15 16:39:22.083 +08:00] [INFO] [node.rs:387] [“start raft store thread”] [store_id=123045]
[2021/06/15 16:39:22.084 +08:00] [INFO] [snap.rs:1137] [“Initializing SnapManager, encryption is enabled: false”]
[2021/06/15 16:39:22.185 +08:00] [INFO] [peer.rs:191] [“create peer”] [peer_id=411767325] [region_id=411767323]
[2021/06/15 16:39:22.190 +08:00] [INFO] [raft.rs:2443] [“switched to configuration”] [config=“Configuration { voters: Configuration
{ incoming: Configuration { voters: {411767324, 411767325, 411767326} }, outgoing: Configuration { voters: {} } }, learners: {}, lea
rners_next: {}, auto_leave: false }”] [raft_id=411767325] [region_id=411767323]
[2021/06/15 16:39:22.190 +08:00] [INFO] [raft.rs:1064] [“became follower at term 166”] [term=166] [raft_id=411767325] [region_id=411
767323]
[2021/06/15 16:39:22.190 +08:00] [INFO] [raft.rs:375] [newRaft] [peers=“Configuration { incoming: Configuration { voters: {411767324
, 411767325, 411767326} }, outgoing: Configuration { voters: {} } }”] [“last term”=143] [“last index”=170] [applied=157] [commit=170
] [term=166] [raft_id=411767325] [region_id=411767323]
[2021/06/15 16:39:22.190 +08:00] [INFO] [raw_node.rs:285] [“RawNode created with id 411767325.”] [id=411767325] [raft_id=411767325]
[region_id=411767323]
[2021/06/15 16:39:22.190 +08:00] [INFO] [peer.rs:191] [“create peer”] [peer_id=414801547] [region_id=414801545]
[2021/06/15 16:39:22.191 +08:00] [INFO] [raft.rs:2443] [“switched to configuration”] [config=“Configuration { voters: Configuration
{ incoming: Configuration { voters: {414801548, 414801546, 414801547} }, outgoing: Configuration { voters: {} } }, learners: {}, lea
rners_next: {}, auto_leave: false }”] [raft_id=414801547] [region_id=414801545]
[2021/06/15 16:39:22.191 +08:00] [INFO] [raft.rs:1064] [“became follower at term 94”] [term=94] [raft_id=414801547] [region_id=41480
1545]

1 个赞

先检查下你的空间够不够。如果是测试环境不重要,可以设置 reserve-space 为 0。或者就扩容大磁盘
https://docs.pingcap.com/zh/tidb/stable/tikv-configuration-file#reserve-space

空间是够的,reserve-space 设置为0 了的(上面贴的日志里也可以看到), 无效
[root@localhost ~]# df -lh
Filesystem Size Used Avail Use% Mounted on
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 3.9G 305M 3.6G 8% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/mapper/centos-root 37G 2.0G 36G 6% /
/dev/sda1 1014M 193M 822M 19% /boot
/dev/mapper/data-data 30G 7.0G 24G 24% /data
tmpfs 783M 0 783M 0% /run/user/0

  1. tiup cluster display <集群名称> 反馈下
  2. 如果是在 /data 目录感觉是空间不足了,先看看其他两个节点有没有可以清理的? 如果所有的都在 /data 目录下,目前一个 tikv 应该是占用 8 G 吗?
  3. 反馈下,最新启动的所有日志。

[root@localhost ~]# tiup cluster display hydee-tidb
Starting component cluster: /root/.tiup/components/cluster/v1.5.1/tiup-cluster display hydee-tidb
Cluster type: tidb
Cluster name: hydee-tidb
Cluster version: v5.0.2
Deploy user: tidb
SSH type: builtin
Dashboard URL: http://192.168.5.44:2379/dashboard
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir


192.168.5.46:9093 alertmanager 192.168.5.46 9093/9094 linux/x86_64 Up /data/tidb-data/alertmanager-9093 /data/tidb-deploy/alertmanager-9093
192.168.5.43:8300 cdc 192.168.5.43 8300 linux/x86_64 Up /data/tidb-data/cdc-8300 /data/tidb-deploy/cdc-8300
192.168.5.44:8300 cdc 192.168.5.44 8300 linux/x86_64 Up /data/tidb-data/cdc-8300 /data/tidb-deploy/cdc-8300
192.168.5.46:3000 grafana 192.168.5.46 3000 linux/x86_64 Up - /data/tidb-deploy/grafana-3000
192.168.5.43:2379 pd 192.168.5.43 2379/2380 linux/x86_64 Up|L /data/tidb-data/pd-2379 /data/tidb-deploy/pd-2379
192.168.5.44:2379 pd 192.168.5.44 2379/2380 linux/x86_64 Up|UI /data/tidb-data/pd-2379 /data/tidb-deploy/pd-2379
192.168.5.46:9090 prometheus 192.168.5.46 9090 linux/x86_64 Up /data/tidb-data/prometheus-9090 /data/tidb-deploy/prometheus-9090
192.168.5.43:4000 tidb 192.168.5.43 4000/10080 linux/x86_64 Up - /data/tidb-deploy/tidb-4000
192.168.5.46:4000 tidb 192.168.5.46 4000/10080 linux/x86_64 Up - /data/tidb-deploy/tidb-4000
192.168.5.44:20160 tikv 192.168.5.44 20160/20180 linux/x86_64 Up /data/tidb-data/tikv-20160 /data/tidb-deploy/tikv-20160
192.168.5.45:20160 tikv 192.168.5.45 20160/20180 linux/x86_64 Up /data/tidb-data/tikv-20160 /data/tidb-deploy/tikv-20160
192.168.5.49:20160 tikv 192.168.5.49 20160/20180 linux/x86_64 Down /data/tidb-data/tikv-20160 /data/tidb-deploy/tikv-20160

其他2个节点的磁盘空间更大一些,没问题
节点1 :
[root@localhost ~]# df -lh
Filesystem Size Used Avail Use% Mounted on
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 3.9G 393M 3.5G 11% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/mapper/centos-root 37G 2.0G 36G 6% /
/dev/sda1 1014M 193M 822M 19% /boot
/dev/mapper/data-data 80G 15G 66G 18% /data
tmpfs 783M 0 783M 0% /run/user/0

节点2:
[root@localhost ~]# df -lh
Filesystem Size Used Avail Use% Mounted on
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 3.9G 409M 3.5G 11% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/mapper/centos-root 37G 4.5G 33G 13% /
/dev/sda1 1014M 193M 822M 19% /boot
/dev/mapper/data-data 80G 11G 70G 14% /data
tmpfs 783M 0 783M 0% /run/user/0

tikv_20210616.log (944.0 KB) tikv_stderr.log (817.2 KB)

  1. 麻烦反馈下 /var/log/message 日志
  2. 请手工在目录下创建文件试试是否可以
  3. 也看下 df -i 的信息

1 在/var/log/message 提示是内存分配问题吗?
2 手工在目录下 可以建文件
3 df-i 如下

Jun 17 09:09:41 localhost systemd: tikv-20160.service holdoff time over, scheduling restart.
Jun 17 09:09:41 localhost systemd: Stopped tikv service.
Jun 17 09:09:41 localhost systemd: Started tikv service.
Jun 17 09:09:41 localhost run_tikv.sh: sync …
Jun 17 09:09:41 localhost run_tikv.sh: real#0110m0.002s
Jun 17 09:09:41 localhost run_tikv.sh: user#0110m0.001s
Jun 17 09:09:41 localhost run_tikv.sh: sys#0110m0.000s
Jun 17 09:09:41 localhost run_tikv.sh: ok
Jun 17 09:09:43 localhost kernel: raftstore-12304 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
Jun 17 09:09:43 localhost kernel: raftstore-12304 cpuset=/ mems_allowed=0
Jun 17 09:09:43 localhost kernel: CPU: 2 PID: 25567 Comm: raftstore-12304 Kdump: loaded Tainted: G ------------ T 3.10.0-1127.13.1.el7.x86_64 #1
Jun 17 09:09:43 localhost kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
Jun 17 09:09:43 localhost kernel: Call Trace:
Jun 17 09:09:43 localhost kernel: [] dump_stack+0x19/0x1b
Jun 17 09:09:43 localhost kernel: [] dump_header+0x90/0x229
Jun 17 09:09:43 localhost kernel: [] ? mem_cgroup_reclaim+0x4e/0x120
Jun 17 09:09:43 localhost kernel: [] oom_kill_process+0x25e/0x3f0
Jun 17 09:09:43 localhost kernel: [] ? cpuset_mems_allowed_intersects+0x21/0x30
Jun 17 09:09:43 localhost kernel: [] mem_cgroup_oom_synchronize+0x546/0x570
Jun 17 09:09:43 localhost kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0
Jun 17 09:09:43 localhost kernel: [] pagefault_out_of_memory+0x14/0x90
Jun 17 09:09:43 localhost kernel: [] mm_fault_error+0x6a/0x157
Jun 17 09:09:43 localhost kernel: [] __do_page_fault+0x491/0x500
Jun 17 09:09:43 localhost kernel: [] do_page_fault+0x35/0x90
Jun 17 09:09:43 localhost kernel: [] page_fault+0x28/0x30
Jun 17 09:09:43 localhost kernel: Task in /system.slice/tikv-20160.service killed as a result of limit of /system.slice/tikv-20160.service
Jun 17 09:09:43 localhost kernel: memory: usage 2097152kB, limit 2097152kB, failcnt 34415
Jun 17 09:09:43 localhost kernel: memory+swap: usage 2097152kB, limit 9007199254740988kB, failcnt 0
Jun 17 09:09:43 localhost kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Jun 17 09:09:43 localhost kernel: Memory cgroup stats for /system.slice/tikv-20160.service: cache:4KB rss:2097148KB rss_huge:194560KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:2097108KB inactive_file:4KB active_file:0KB unevictable:0KB
Jun 17 09:09:43 localhost kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Jun 17 09:09:43 localhost kernel: [25482] 1000 25482 798110 526966 1213 0 0 tikv-server
Jun 17 09:09:43 localhost kernel: Memory cgroup out of memory: Kill process 25593 (status-server) score 1007 or sacrifice child
Jun 17 09:09:43 localhost kernel: Killed process 25482 (tikv-server), UID 1000, total-vm:3192440kB, anon-rss:2094720kB, file-rss:13144kB, shmem-rss:0kB
Jun 17 09:09:43 localhost systemd: tikv-20160.service: main process exited, code=killed, status=9/KILL
Jun 17 09:09:43 localhost systemd: Unit tikv-20160.service entered failed state.
Jun 17 09:09:43 localhost systemd: tikv-20160.service failed.

[root@localhost log]# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
devtmpfs 998197 368 997829 1% /dev
tmpfs 1001132 1 1001131 1% /dev/shm
tmpfs 1001132 556 1000576 1% /run
tmpfs 1001132 16 1001116 1% /sys/fs/cgroup
/dev/mapper/centos-root 19394560 40797 19353763 1% /
/dev/sda1 524288 334 523954 1% /boot
/dev/mapper/data-data 15726592 532 15726060 1% /data
tmpfs 1001132 1 1001131 1% /run/user/0

  1. 麻烦找一下两个时间能够对应上的, tikv.log 和 message 日志,这个 message 日志是 17 号的。
  2. 是 vmware的虚拟机吗? 如果可以的话,重新分配一个和其他两个磁盘容量差不多的新机器,扩容后,缩容这个有问题的机器吧。

1 你说的这2个日志 一直在刷, 16号的也有,见下;
2 是阿里云的机器,你说的这个动作 已经做过一次了,这次报错的, 就是新扩容的机器;原来有一台46的机器,报这个错,49 是扩容的新机器, 46 缩容掉了, 现在49 又报这个错。

Jun 16 00:01:01 localhost systemd: Started Session 3245 of user root.
Jun 16 00:01:03 localhost systemd: tikv-20160.service holdoff time over, scheduling restart.
Jun 16 00:01:03 localhost run_tikv.sh: sync …
Jun 16 00:01:03 localhost run_tikv.sh: real#0110m0.002s
Jun 16 00:01:03 localhost run_tikv.sh: user#0110m0.001s
Jun 16 00:01:03 localhost run_tikv.sh: sys#0110m0.000s
Jun 16 00:01:03 localhost run_tikv.sh: ok
Jun 16 00:01:04 localhost kernel: raftstore-12304 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
Jun 16 00:01:04 localhost kernel: raftstore-12304 cpuset=/ mems_allowed=0
Jun 16 00:01:04 localhost kernel: Call Trace:
Jun 16 00:01:04 localhost kernel: [] dump_stack+0x19/0x1b
Jun 16 00:01:04 localhost kernel: [] mem_cgroup_oom_synchronize+0x546/0x570
Jun 16 00:01:04 localhost kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0
Jun 16 00:01:04 localhost kernel: [] pagefault_out_of_memory+0x14/0x90
Jun 16 00:01:04 localhost kernel: [] mm_fault_error+0x6a/0x157
Jun 16 00:01:04 localhost kernel: [] __do_page_fault+0x491/0x500
Jun 16 00:01:04 localhost kernel: [] do_page_fault+0x35/0x90
Jun 16 00:01:04 localhost kernel: [] page_fault+0x28/0x30
Jun 16 00:01:04 localhost kernel: [] ? cpuset_mems_allowed_intersects+0x21/0x30
Jun 16 00:01:04 localhost kernel: [] mem_cgroup_oom_synchronize+0x546/0x570
Jun 16 00:01:04 localhost kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0
Jun 16 00:01:04 localhost kernel: [] pagefault_out_of_memory+0x14/0x90
Jun 16 00:01:04 localhost kernel: [] mm_fault_error+0x6a/0x157
Jun 16 00:01:04 localhost kernel: [] __do_page_fault+0x491/0x500
Jun 16 00:01:04 localhost kernel: [] do_page_fault+0x35/0x90
Jun 16 00:01:04 localhost kernel: [] page_fault+0x28/0x30
Jun 16 00:01:04 localhost kernel: Task in /system.slice/tikv-20160.service killed as a result of limit of /system.slice/tikv-20160.s
ervice
Jun 16 00:01:04 localhost kernel: memory: usage 2097152kB, limit 2097152kB, failcnt 35932
Jun 16 00:01:04 localhost kernel: memory+swap: usage 2097152kB, limit 9007199254740988kB, failcnt 0
Jun 16 00:01:04 localhost kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Jun 16 00:01:04 localhost kernel: Memory cgroup stats for /system.slice/tikv-20160.service: cache:12KB rss:2097056KB rss_huge:239616
KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:2096936KB inactive_file:8KB active_file:4KB unevictable:0KB
Jun 16 00:01:04 localhost kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Jun 16 00:01:04 localhost kernel: [ 1611] 1000 1611 802205 527041 1211 0 0 tikv-server
Jun 16 00:01:04 localhost kernel: Memory cgroup out of memory: Kill process 1722 (status-server) score 1007 or sacrifice child
Jun 16 00:01:04 localhost kernel: Killed process 1611 (tikv-server), UID 1000, total-vm:3208820kB, anon-rss:2095096kB, file-rss:1306
8kB, shmem-rss:0kB
Jun 16 00:01:05 localhost systemd: tikv-20160.service: main process exited, code=killed, status=9/KILL
Jun 16 00:01:05 localhost systemd: Unit tikv-20160.service entered failed state.
Jun 16 00:01:05 localhost systemd: tikv-20160.service failed.
Jun 16 00:01:20 localhost systemd: tikv-20160.service holdoff time over, scheduling restart.
Jun 16 00:01:20 localhost systemd: Stopped tikv service.
Jun 16 00:01:20 localhost systemd: Started tikv service.
Jun 16 00:01:20 localhost run_tikv.sh: sync …
Jun 16 00:01:20 localhost run_tikv.sh: real#0110m0.002s
Jun 16 00:01:20 localhost run_tikv.sh: user#0110m0.001s
Jun 16 00:01:20 localhost run_tikv.sh: sys#0110m0.000s
Jun 16 00:01:20 localhost run_tikv.sh: ok
Jun 16 00:01:22 localhost kernel: raftstore-12304 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
Jun 16 00:01:22 localhost kernel: raftstore-12304 cpuset=/ mems_allowed=0
Jun 16 00:01:22 localhost kernel: CPU: 2 PID: 1823 Comm: raftstore-12304 Kdump: loaded Tainted: G ------------ T 3.10.
0-1127.13.1.el7.x86_64 #1
Jun 16 00:01:22 localhost kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09
/30/2014

  1. 这个内存限制只有2G吗?
  2. 检查下 /system.slice/tikv-20160.service 这个文件配置里的限制。

修改成了5G , 报错变成如下, 还是起不来:

Jun 17 14:42:36 localhost systemd: tikv-20160.service holdoff time over, scheduling restart.
Jun 17 14:42:36 localhost systemd: Stopped tikv service.
Jun 17 14:42:36 localhost systemd: Started tikv service.
Jun 17 14:42:36 localhost run_tikv.sh: sync …
Jun 17 14:42:36 localhost run_tikv.sh: real#0110m0.009s
Jun 17 14:42:36 localhost run_tikv.sh: user#0110m0.000s
Jun 17 14:42:36 localhost run_tikv.sh: sys#0110m0.001s
Jun 17 14:42:36 localhost run_tikv.sh: ok
Jun 17 14:42:37 localhost systemd: tikv-20160.service: main process exited, code=killed, status=6/ABRT
Jun 17 14:42:37 localhost systemd: Unit tikv-20160.service entered failed state.
Jun 17 14:42:37 localhost systemd: tikv-20160.service failed.
Jun 17 14:42:53 localhost systemd: tikv-20160.service holdoff time over, scheduling restart.
Jun 17 14:42:53 localhost systemd: Stopped tikv service.
Jun 17 14:42:53 localhost systemd: Started tikv service.
Jun 17 14:42:53 localhost run_tikv.sh: sync …
Jun 17 14:42:53 localhost run_tikv.sh: real#0110m0.007s
Jun 17 14:42:53 localhost run_tikv.sh: user#0110m0.001s
Jun 17 14:42:53 localhost run_tikv.sh: sys#0110m0.000s
Jun 17 14:42:53 localhost run_tikv.sh: ok
Jun 17 14:42:54 localhost systemd: tikv-20160.service: main process exited, code=killed, status=6/ABRT
Jun 17 14:42:54 localhost systemd: Unit tikv-20160.service entered failed state.
Jun 17 14:42:54 localhost systemd: tikv-20160.service failed.

还有 tikv_stderr.log 这个文件里面一直报类似下面这种错,内存明明是够的, 为啥分配不成功?而且是那么小的量
memory allocation of 24124 bytes failed
: Malformed conf string
: Malformed conf string
memory allocation of 72847 bytes failed
: Malformed conf string
: Malformed conf string
memory allocation of 69762 bytes failed
: Malformed conf string
: Malformed conf string
: Malformed conf string
: Malformed conf string
memory allocation of 69762 bytes failed
: Malformed conf string
: Malformed conf string
memory allocation of 1880 bytes failed
: Malformed conf string
: Malformed conf string
memory allocation of 1616 bytes failed
: Malformed conf string
: Malformed conf string
memory allocation of 752 bytes failed

  1. 看下其他两个能起到的这个文件是如何配置的,看看配置的内存是多少?
  2. 机器配置一样吗? 这个虚拟机分配了多少内存?找一个和其他两个节点一样大小的虚拟机,添加试试。
    tikv 别用配置不一样的。

通过调大内存起来了, 原来是8G, 调大到16G内存, 这个tikv节点起来了
为什么会down掉的原因尚不清楚,down掉之后,重启需要消耗较大内存,我这里是消耗了5G,再加上其他,机器本身8G内存,起不来该节点
这个问题解决了

我这个是新扩容的实例,tikv分配了8C 48gb内存,1.5TB存储,也是报这个错误。是原因呢?

最好新建帖子

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。