编译的 TiDB,跑自己写的测试崩了

写了一个测试脚本,今天看服务器直接连接不上了,显示以下错误:

[2020/09/12 08:29:21.236 +00:00] [INFO] [levels.go:902] ["compact failed"] [def="1 top:[22:23](7992824), bot:[136:138](14264618), skip:0, write_amp:2.78"] [error="open /tmp/tidb/kv/0000046d.sst.idx: too many open files"]
[2020/09/12 08:29:21.236 +00:00] [INFO] [levels.go:902] ["compact failed"] [def="1 top:[37:132](9927492), bot:[194:232](4038537), skip:0, write_amp:1.41"] [error="open /tmp/tidb/kv/0000046e.sst.idx: too many open files"]
[2020/09/12 08:29:21.259 +00:00] [INFO] [levels.go:882] ["start compaction"] [level=1] [score=1.0331481620669365]
[2020/09/12 08:29:21.259 +00:00] [INFO] [levels.go:899] ["running compaction"] [def="1 top:[25:120](9927492), bot:[205:243](4038537), skip:0, write_amp:1.41"]
[2020/09/12 08:29:21.259 +00:00] [INFO] [levels.go:367] ["check range with lower level"] [overlapped=false]
[2020/09/12 08:29:21.259 +00:00] [INFO] [levels.go:902] ["compact failed"] [def="1 top:[25:120](9927492), bot:[205:243](4038537), skip:0, write_amp:1.41"] [error="open /tmp/tidb/kv/0000046f.sst: too many open files"]
[2020/09/12 08:29:21.449 +00:00] [ERROR] [server.go:311] ["accept failed"] [error="accept tcp [::]:4000: accept4: too many open files"]
[2020/09/12 08:29:21.449 +00:00] [FATAL] [terror.go:257] ["unexpected error"] [error="accept tcp [::]:4000: accept4: too many open files"] [stack="github.com/pingcap/parser/terror.MustNil\
\t/home/zhangys/go/pkg/mod/github.com/pingcap/parser@v0.0.0-20200909072241-6dac7bb703e2/terror/terror.go:257\
main.runServer\
\t/home/zhangys/database/tidb0518/tidb-server/main.go:683\
main.main\
\t/home/zhangys/database/tidb0518/tidb-server/main.go:186\
runtime.main\
\t/usr/lib/go-1.15/src/runtime/proc.go:204"] [stack="github.com/pingcap/parser/terror.MustNil\
\t/home/zhangys/go/pkg/mod/github.com/pingcap/parser@v0.0.0-20200909072241-6dac7bb703e2/terror/terror.go:257\
main.runServer\
\t/home/zhangys/database/tidb0518/tidb-server/main.go:683\
main.main\
\t/home/zhangys/database/tidb0518/tidb-server/main.go:186\
runtime.main\
\t/usr/lib/go-1.15/src/runtime/proc.go:204"]
exit status 1

版本:commit 94704d0cc49343862fc6a59e59a531d38a463a4f (HEAD -> master, origin/master, origin/HEAD)

我的测试不方便给出,跑要求的脚本错误。请问需要提供什么信息?

运行命令 go run tidb-server/main.go, 直接重启报错,但可以继续运行

[2020/09/13 05:55:43.058 +00:00] [INFO] [domain.go:1097] ["init stats info time"] ["take time"=2.572726032s]
[2020/09/13 05:55:45.793 +00:00] [INFO] [levels.go:902] ["compact failed"] [def="1 top:[25:120](9927492), bot:[205:243](4038537), skip:0, write_amp:1.41"] [error="open /tmp/tidb/kv/000004d5.sst: too many open files"]
[2020/09/13 05:56:23.016 +00:00] [INFO] [coprocessor.go:933] ["[TIME_COP_PROCESS] resp_time:12.532446379s txnStartTS:419424258092695552 region_id:36 store_addr:store1 kv_process_ms:12508"]
[2020/09/13 05:56:31.012 +00:00] [INFO] [levels.go:527] ["compact send discard stats"] [stats="numSkips:0, skippedBytes:0"]
[2020/09/13 05:56:31.029 +00:00] [INFO] [levels.go:866] ["compaction done"] [def="1 top:[14:18](400989576), bot:[63:65](664584480), skip:0, write_amp:2.66"] [deleted=6] [added=5] [duration=48.447212317s]
[2020/09/13 05:56:31.029 +00:00] [INFO] [levels.go:906] ["compaction done"] [level=1]
zhangys@xxx:~$ lsof -i tcp:4000
COMMAND     PID    USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
main    1575411 zhangys  862u  IPv6 468735764      0t0  TCP *:4000 (LISTEN)

zhangys@xxx:~$ sudo lsof -p 1575411 | wc -l
[sudo] password for zhangys: 
1487

检查系统配置 ulimit -n

或者可以看下 sudo lsof -p 1575411 这个结果大量 open 的文件具体是啥?

zhangys@xxx:~$ ulimit -n
1024
zhangys@xxx:~$ sudo lsof -p 1575411 | wc -l
1487

请看附件 saved.txt (154.8 KB)

把 ulimit -n 的值调大试试吧

回复 github issue
out.txt (125.1 KB)

麻烦您自己查一下你写的tidb-server都在链接什么?要么减少链接,要么增大 ulimit -n的值。