GCP SSD测试IOPS不通过怎么办?

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:tidb-ansible v3.0.8
  • 【问题描述】:初始化环境
ansible-playbook bootstrap.yml
TASK [bootstrap : start irqbalance service] ***************************************************************************************************************************************************************************
fatal: [10.142.0.14]: FAILED! => {"changed": false, "msg": "Could not find the requested service irqbalance: host"}
...ignoring
fatal: [10.142.0.15]: FAILED! => {"changed": false, "msg": "Could not find the requested service irqbalance: host"}
...ignoring
fatal: [10.142.0.13]: FAILED! => {"changed": false, "msg": "Could not find the requested service irqbalance: host"}
...ignoring
fatal: [10.142.0.20]: FAILED! => {"changed": false, "msg": "Could not find the requested service irqbalance: host"}
...ignoring
fatal: [10.142.0.12]: FAILED! => {"changed": false, "msg": "Could not find the requested service irqbalance: host"}
...ignoring
fatal: [10.142.0.17]: FAILED! => {"changed": false, "msg": "Could not find the requested service irqbalance: host"}
...ignoring
fatal: [10.142.0.18]: FAILED! => {"changed": false, "msg": "Could not find the requested service irqbalance: host"}
...ignoring
fatal: [10.142.0.19]: FAILED! => {"changed": false, "msg": "Could not find the requested service irqbalance: host"}
...ignoring
fatal: [10.142.0.16]: FAILED! => {"changed": false, "msg": "Could not find the requested service irqbalance: host"}
...ignoring


......


TASK [machine_benchmark : fio randread benchmark command] *************************************************************************************************************************************************************
ok: [10.142.0.17] => {
    "msg": "fio randread benchmark command: cd /data/tidb/deploy/data/fio && ./fio -ioengine=psync -bs=32k -fdatasync=1 -thread -rw=randread -size=10G -filename=fio_randread_test.txt -name='fio randread test' -iodepth=4 -runtime=60 -numjobs=4 -group_reporting --output-format=json --output=fio_randread_result.json."
}

TASK [machine_benchmark : fio randread benchmark summary] *************************************************************************************************************************************************************
ok: [10.142.0.17] => {
    "msg": "fio randread benchmark summary: jobname: fio randread test\nread: IOPS=11836\nlat (ns): min=2787, max=252167151, avg=336486\nclat percentiles (ns): 95.00th=888832, 99.00th=1351680."
}
ok: [10.142.0.18] => {
    "msg": "fio randread benchmark summary: jobname: fio randread test\nread: IOPS=13414\nlat (ns): min=2666, max=37775034, avg=296744\nclat percentiles (ns): 95.00th=864256, 99.00th=1302528."
}
ok: [10.142.0.19] => {
    "msg": "fio randread benchmark summary: jobname: fio randread test\nread: IOPS=12243\nlat (ns): min=2832, max=17279028, avg=325257\nclat percentiles (ns): 95.00th=888832, 99.00th=1351680."
}

TASK [machine_benchmark : Preflight check - Does fio randread iops of tikv_data_dir disk meet requirement] ************************************************************************************************************
fatal: [10.142.0.17]: FAILED! => {"changed": false, "msg": "fio: randread iops of tikv_data_dir disk is too low: 11836 < 40000, it is strongly recommended to use SSD disks for TiKV and PD, or there might be performance issues."}
fatal: [10.142.0.18]: FAILED! => {"changed": false, "msg": "fio: randread iops of tikv_data_dir disk is too low: 13414 < 40000, it is strongly recommended to use SSD disks for TiKV and PD, or there might be performance issues."}
fatal: [10.142.0.19]: FAILED! => {"changed": false, "msg": "fio: randread iops of tikv_data_dir disk is too low: 12243 < 40000, it is strongly recommended to use SSD disks for TiKV and PD, or there might be performance issues."}
        to retry, use: --limit @/home/tidb/tidb-ansible/retry_files/bootstrap.retry

PLAY RECAP ************************************************************************************************************************************************************************************************************
10.142.0.12                : ok=33   changed=9    unreachable=0    failed=0   
10.142.0.13                : ok=33   changed=9    unreachable=0    failed=0   
10.142.0.14                : ok=34   changed=10   unreachable=0    failed=0   
10.142.0.15                : ok=34   changed=10   unreachable=0    failed=0   
10.142.0.16                : ok=34   changed=10   unreachable=0    failed=0   
10.142.0.17                : ok=43   changed=16   unreachable=0    failed=1   
10.142.0.18                : ok=42   changed=16   unreachable=0    failed=1   
10.142.0.19                : ok=42   changed=16   unreachable=0    failed=1   
10.142.0.20                : ok=33   changed=9    unreachable=0    failed=0   
localhost                  : ok=7    changed=4    unreachable=0    failed=0   


ERROR MESSAGE SUMMARY *************************************************************************************************************************************************************************************************
[10.142.0.17]: Ansible FAILED! => playbook: bootstrap.yml; TASK: machine_benchmark : Preflight check - Does fio randread iops of tikv_data_dir disk meet requirement; message: {"changed": false, "msg": "fio: randread iops of tikv_data_dir disk is too low: 11836 < 40000, it is strongly recommended to use SSD disks for TiKV and PD, or there might be performance issues."}

[10.142.0.18]: Ansible FAILED! => playbook: bootstrap.yml; TASK: machine_benchmark : Preflight check - Does fio randread iops of tikv_data_dir disk meet requirement; message: {"changed": false, "msg": "fio: randread iops of tikv_data_dir disk is too low: 13414 < 40000, it is strongly recommended to use SSD disks for TiKV and PD, or there might be performance issues."}

[10.142.0.19]: Ansible FAILED! => playbook: bootstrap.yml; TASK: machine_benchmark : Preflight check - Does fio randread iops of tikv_data_dir disk meet requirement; message: {"changed": false, "msg": "fio: randread iops of tikv_data_dir disk is too low: 12243 < 40000, it is strongly recommended to use SSD disks for TiKV and PD, or there might be performance issues."}

inventory.ini (1.9 KB) ansible.log (135.4 KB)

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

看测试结果磁盘随机读 iops 10000+ 离 40000 差距较大,建议使用更高性能的 本地 SSD 进行测试。

我也在考虑是不是要用本地SSD,但查资料看到这个(看附件截图),本地SSD适合存储低价值数据,瞬间感觉怎么不那么靠谱啊,生产环境啊,要不要用?又纠结了…

线上部署 TiDB 集群

请选择满足 tidb-ansible 的 machine benchmark 要求的本地 ssd 进行测试。理论上本地 SSD 的 iops 和 latency 更能得到保障。如果采用 GCP 部署 TiDB 集群,可以考虑 K8s 部署,有完整的支持方案和最佳实践。

我这边的项目没有使用容器部署,如果tidb使用容器部署,相当于容器外访问tidb,会不会不太好?

如果容器部署不可行,还是回到最初的解决办法,就是 tidb-ansible 部署,但是要保证本地 SSD 性能达标。建议使用更高性能的本地 SSD 进行测试和上线。

嗯,本地SSD应该可以的,只是看到GCP官网上说有丢数据风险,比如账户欠费(真欠费过)导致VM停机(真停机过),有些不敢用本地SSD,毕竟是生产环境啊

目前大部分线上云主机用户都使用的本地 SSD 来作为 TiDB 的 TiKV 数据盘,如果对于 GCP 不太放心,可以选用其他家的云主机。

好尴尬啊,目前这边只能使用GCP …

再问问,GCP的GKE部署TiDB用的磁盘也是本地SSD吗?

建议按照官方文档的硬件环境要求部署,如果非 local ssd 也可以满足 fio 测试同时能够保证数据的稳定性和安全性,不排斥。

现在使用本地SSD,还是压测不通过,不过之前的那个测试通过了,目前集群中就tikv使用的是本地SSD

TASK [machine_benchmark : fio mixed randread and sequential write benchmark command] **********************************************************************************************************************************
ok: [10.142.0.17] => {
    "msg": "fio mixed randread and sequential write benchmark command: cd /data/tidb/deploy/data/fio && ./fio -ioengine=psync -bs=32k -fdatasync=1 -thread -rw=randrw -percentage_random=100,0 -size=10G -filename=fio_randread_write_test.txt -name='fio mixed randread and sequential write test' -iodepth=4 -runtime=60 -numjobs=4 -group_reporting --output-format=json --output=fio_randread_write_test.json."
}

TASK [machine_benchmark : fio mixed randread and sequential write benchmark summary] **********************************************************************************************************************************
ok: [10.142.0.17] => {
    "msg": "fio mixed randread and sequential write benchmark summary: jobname: fio mixed randread and sequential write test\nread: IOPS=240\nlat (ns): min=6118, max=1461619, avg=324197\nclat percentiles (ns): 95.00th=452608, 99.00th=528384\nwrite: IOPS=247\nlat (ns): min=13941, max=126443, avg=28455\nclat percentiles (ns): 95.00th=45312, 99.00th=58112."
}
ok: [10.142.0.18] => {
    "msg": "fio mixed randread and sequential write benchmark summary: jobname: fio mixed randread and sequential write test\nread: IOPS=153\nlat (ns): min=7592, max=1726346, avg=335225\nclat percentiles (ns): 95.00th=460800, 99.00th=509952\nwrite: IOPS=159\nlat (ns): min=13994, max=107028, avg=29128\nclat percentiles (ns): 95.00th=45824, 99.00th=58624."
}
ok: [10.142.0.19] => {
    "msg": "fio mixed randread and sequential write benchmark summary: jobname: fio mixed randread and sequential write test\nread: IOPS=222\nlat (ns): min=6189, max=7801264, avg=344201\nclat percentiles (ns): 95.00th=460800, 99.00th=528384\nwrite: IOPS=229\nlat (ns): min=12506, max=119391, avg=27637\nclat percentiles (ns): 95.00th=45312, 99.00th=57600."
}

TASK [machine_benchmark : Preflight check - Does fio mixed randread and sequential write iops of tikv_data_dir disk meet requirement - randread] **********************************************************************
fatal: [10.142.0.17]: FAILED! => {"changed": false, "msg": "fio mixed randread and sequential write test: randread iops of  tikv_data_dir disk is too low: 240 < 10000, it is strongly recommended to use SSD disks for TiKV and PD, or there might be performance issues."}
fatal: [10.142.0.18]: FAILED! => {"changed": false, "msg": "fio mixed randread and sequential write test: randread iops of  tikv_data_dir disk is too low: 153 < 10000, it is strongly recommended to use SSD disks for TiKV and PD, or there might be performance issues."}
fatal: [10.142.0.19]: FAILED! => {"changed": false, "msg": "fio mixed randread and sequential write test: randread iops of  tikv_data_dir disk is too low: 222 < 10000, it is strongly recommended to use SSD disks for TiKV and PD, or there might be performance issues."}
        to retry, use: --limit @/home/tidb/tidb-ansible/retry_files/bootstrap.retry

PLAY RECAP ************************************************************************************************************************************************************************************************************
10.142.0.17                : ok=53   changed=18   unreachable=0    failed=1   
10.142.0.18                : ok=51   changed=18   unreachable=0    failed=1   
10.142.0.19                : ok=51   changed=18   unreachable=0    failed=1   


ERROR MESSAGE SUMMARY *************************************************************************************************************************************************************************************************
[10.142.0.17]: Ansible FAILED! => playbook: bootstrap.yml; TASK: machine_benchmark : Preflight check - Does fio mixed randread and sequential write iops of tikv_data_dir disk meet requirement - randread; message: {"changed": false, "msg": "fio mixed randread and sequential write test: randread iops of  tikv_data_dir disk is too low: 240 < 10000, it is strongly recommended to use SSD disks for TiKV and PD, or there might be performance issues."}

[10.142.0.18]: Ansible FAILED! => playbook: bootstrap.yml; TASK: machine_benchmark : Preflight check - Does fio mixed randread and sequential write iops of tikv_data_dir disk meet requirement - randread; message: {"changed": false, "msg": "fio mixed randread and sequential write test: randread iops of  tikv_data_dir disk is too low: 153 < 10000, it is strongly recommended to use SSD disks for TiKV and PD, or there might be performance issues."}

[10.142.0.19]: Ansible FAILED! => playbook: bootstrap.yml; TASK: machine_benchmark : Preflight check - Does fio mixed randread and sequential write iops of tikv_data_dir disk meet requirement - randread; message: {"changed": false, "msg": "fio mixed randread and sequential write test: randread iops of  tikv_data_dir disk is too low: 222 < 10000, it is strongly recommended to use SSD disks for TiKV and PD, or there might be performance issues."}

Ask for help:
Contact us: support@pingcap.com
It seems that you encounter some problems. You can send an email to the above email address, attached with the tidb-ansible/inventory.ini and tidb-ansible/log/ansible.log files and the error message, or new issue on https://github.com/pingcap/tidb-ansible/issues. We'll try our best to help you deploy a TiDB cluster. Thanks. :-)

通过的 SSD 可以用

停用写入缓存刷新之后压测结果有所提升

TASK [machine_benchmark : fio mixed randread and sequential write benchmark command] **********************************************************************************************************************************
ok: [10.142.0.17] => {
    "msg": "fio mixed randread and sequential write benchmark command: cd /data/tidb/deploy/data/fio && ./fio -ioengine=psync -bs=32k -fdatasync=1 -thread -rw=randrw -percentage_random=100,0 -size=10G -filename=fio_randread_write_test.txt -name='fio mixed randread and sequential write test' -iodepth=4 -runtime=60 -numjobs=4 -group_reporting --output-format=json --output=fio_randread_write_test.json."
}

TASK [machine_benchmark : fio mixed randread and sequential write benchmark summary] **********************************************************************************************************************************
ok: [10.142.0.17] => {
    "msg": "fio mixed randread and sequential write benchmark summary: jobname: fio mixed randread and sequential write test\nread: IOPS=8828\nlat (ns): min=3741, max=14489486, avg=163046\nclat percentiles (ns): 95.00th=514048, 99.00th=856064\nwrite: IOPS=8850\nlat (ns): min=10493, max=371206, avg=28604\nclat percentiles (ns): 95.00th=53504, 99.00th=67072."
}
ok: [10.142.0.19] => {
    "msg": "fio mixed randread and sequential write benchmark summary: jobname: fio mixed randread and sequential write test\nread: IOPS=8819\nlat (ns): min=3761, max=18662469, avg=164047\nclat percentiles (ns): 95.00th=561152, 99.00th=700416\nwrite: IOPS=8837\nlat (ns): min=10413, max=625584, avg=27155\nclat percentiles (ns): 95.00th=50432, 99.00th=64256."
}
ok: [10.142.0.18] => {
    "msg": "fio mixed randread and sequential write benchmark summary: jobname: fio mixed randread and sequential write test\nread: IOPS=9167\nlat (ns): min=4018, max=10557904, avg=157036\nclat percentiles (ns): 95.00th=452608, 99.00th=774144\nwrite: IOPS=9181\nlat (ns): min=10157, max=202541, avg=27689\nclat percentiles (ns): 95.00th=51456, 99.00th=64768."
}

TASK [machine_benchmark : Preflight check - Does fio mixed randread and sequential write iops of tikv_data_dir disk meet requirement - randread] **********************************************************************
fatal: [10.142.0.17]: FAILED! => {"changed": false, "msg": "fio mixed randread and sequential write test: randread iops of  tikv_data_dir disk is too low: 8828 < 10000, it is strongly recommended to use SSD disks for TiKV and PD, or there might be performance issues."}
fatal: [10.142.0.18]: FAILED! => {"changed": false, "msg": "fio mixed randread and sequential write test: randread iops of  tikv_data_dir disk is too low: 9167 < 10000, it is strongly recommended to use SSD disks for TiKV and PD, or there might be performance issues."}
fatal: [10.142.0.19]: FAILED! => {"changed": false, "msg": "fio mixed randread and sequential write test: randread iops of  tikv_data_dir disk is too low: 8819 < 10000, it is strongly recommended to use SSD disks for TiKV and PD, or there might be performance issues."}
        to retry, use: --limit @/home/tidb/tidb-ansible/retry_files/bootstrap.retry

PLAY RECAP ************************************************************************************************************************************************************************************************************
10.142.0.17                : ok=53   changed=18   unreachable=0    failed=1   
10.142.0.18                : ok=51   changed=18   unreachable=0    failed=1   
10.142.0.19                : ok=51   changed=18   unreachable=0    failed=1   


ERROR MESSAGE SUMMARY *************************************************************************************************************************************************************************************************
[10.142.0.17]: Ansible FAILED! => playbook: bootstrap.yml; TASK: machine_benchmark : Preflight check - Does fio mixed randread and sequential write iops of tikv_data_dir disk meet requirement - randread; message: {"changed": false, "msg": "fio mixed randread and sequential write test: randread iops of  tikv_data_dir disk is too low: 8828 < 10000, it is strongly recommended to use SSD disks for TiKV and PD, or there might be performance issues."}

[10.142.0.18]: Ansible FAILED! => playbook: bootstrap.yml; TASK: machine_benchmark : Preflight check - Does fio mixed randread and sequential write iops of tikv_data_dir disk meet requirement - randread; message: {"changed": false, "msg": "fio mixed randread and sequential write test: randread iops of  tikv_data_dir disk is too low: 9167 < 10000, it is strongly recommended to use SSD disks for TiKV and PD, or there might be performance issues."}

[10.142.0.19]: Ansible FAILED! => playbook: bootstrap.yml; TASK: machine_benchmark : Preflight check - Does fio mixed randread and sequential write iops of tikv_data_dir disk meet requirement - randread; message: {"changed": false, "msg": "fio mixed randread and sequential write test: randread iops of  tikv_data_dir disk is too low: 8819 < 10000, it is strongly recommended to use SSD disks for TiKV and PD, or there might be performance issues."}

Ask for help:
Contact us: support@pingcap.com
It seems that you encounter some problems. You can send an email to the above email address, attached with the tidb-ansible/inventory.ini and tidb-ansible/log/ansible.log files and the error message, or new issue on https://github.com/pingcap/tidb-ansible/issues. We'll try our best to help you deploy a TiDB cluster. Thanks. :-)

再看看我停用磁盘写入缓存刷新之后的结果已经很接近10000了

[root@ip-10-142-0-17 ~]# mount -t ext4
/dev/nvme0n1p1 on /data type ext4 (rw,noatime,discard,nodelalloc,nobarrier,data=ordered)

建议还是以官方要求为准