【TiDBer 唠嗑茶话会 48】非正式 TiDB相关 SQL 脚本征集大赛!

这个是什么的 SQL 补充一下到原回复里面,我们下周准备整理成一个文章,方便各位 TiDBer 使用。

这个是什么的 SQL 补充一下到原回复里面,我们下周准备整理成一个文章,方便各位 TiDBer 使用。

这个是什么的 SQL 补充一下到原回复里面,我们下周准备整理成一个文章,方便各位 TiDBer 使用。

再来一个:

读取历史数据

使用 AS OF TIMESTAMP 语法读取历史数据

可以通过以下三种方式使用 AS OF TIMESTAMP 语法:
SELECT … FROM … AS OF TIMESTAMP
START TRANSACTION READ ONLY AS OF TIMESTAMP
SET TRANSACTION READ ONLY AS OF TIMESTAMP

select * from t as of timestamp '2021-05-26 16:45:26';
start transaction read only as of timestamp '2021-05-26 16:45:26';
set transaction read only as of timestamp '2021-05-26 16:45:26';

通过系统变量 tidb_read_staleness 读取历史数据

–从 5 秒前至现在的时间范围内选择一个尽可能新的时间戳
set @@tidb_read_staleness="-5";

通过系统变量 tidb_snapshot 读取历史数据

–设置一个特殊的环境变量,这个是一个 session scope 的变量,其意义为读取这个时间之前的最新的一个版本
set @@tidb_snapshot="2016-10-08 16:45:26";

–清空这个变量后,即可读取最新版本数据
set @@tidb_snapshot=“”;

通过SQL查询所有节点所在OS的CPU当前使用率:

SELECT
  b.time,
  a.hostname,
  a.ip,
  a.types,
  b.cpu_used_percent
FROM
  (
    SELECT
      GROUP_CONCAT(TYPE) AS TYPES,
      SUBSTRING_INDEX(instance, ':', 1) AS ip,
      value AS hostname
    FROM
      information_schema.cluster_systeminfo
    WHERE
      name = 'kernel.hostname'
    GROUP BY
      ip,
      hostname
  ) a,
  (
    SELECT
      time,
      SUBSTRING_INDEX(instance, ':', 1) AS ip,
(100 - value) AS cpu_used_percent
    FROM
      metrics_schema.node_cpu_usage
    WHERE
      MODE = 'idle'
      AND time = NOW()
  ) b
WHERE
  a.ip = b.ip

输出示例:

+----------------------------+-----------------------+----------------+----------------------+--------------------+
| time                       | hostname              | ip             | types                | cpu_used_percent   |
+----------------------------+-----------------------+----------------+----------------------+--------------------+
| 2023-01-10 22:40:15.000000 | localhost.localdomain | 192.168.31.201 | tidb,pd,tikv,tiflash | 11.438079153798114 |
+----------------------------+-----------------------+----------------+----------------------+--------------------+
1 row in set (0.04 sec)

说明:我这里所有类型组件只创建了有一个而且都在一个os上,所以只显示了一行。

这是我实践中清理tidb大量数据的脚本,这样可以实现删除百万级别以上的数据,而且不影响tidb正常使用,希望能帮助到你

date1=date --date "7 days ago" +"%Y-%m-%d"
delete_db_sql=“delete from mysql_table where create_date_time<‘$date1’ limit 10000”

i=0

while ((++i)); do
a=/bin/mysql -uroot -p123456 -A mysql_database -h127.0.0.1 --comments -e "${delete_db_sql}" -vvv|grep "Query OK" |awk '{print $3}'
if(($a<1)); then
break 1
fi
sleep 1
printf “%-4d” $((i))

done

开启 Follower Read

SET [GLOBAL] tidb_replica_read = ‘follower’;
可以将变量 tidb_replica_read 的值(默认为 leader )设置为 followerleader-and-followerprefer-leaderclosest-replicasclosest-adaptive

尽量减少系统影响的情况下比较两张表的数据是否一致
通过cdc同步数据时,比较上下游两张表的数据是否一致可以选择用sync-diff-inspector,但是该工具会将所有记录抽取出来进行比较,如此一来:
1、如果上下游数据不在一个地域,那么抽取数据进行数据比较时会占用大量网络带宽,影响其它同步任务。
2、会全表数据从tikv中读取到tidb中,可能会导致其它TP交易性能抖动,尤其遇到大宽表和带有大对象类型表时,更加显著。
ticdc同步数据正常情况下上下游数据是一致的,也就是可能存在不同步的表概率很小,那么我们可以预先判断上下游表是否一致,如果一致则不需要处理,如果不一致则进一步采用其它手段进行处理。
这里我们寻找一个方法来比较两张表记录是否一致。
1、根据cdc同步情况,获取一个时间戳(下游完成了时间点的回放,参考:ticdc的Syncpoint 功能),并根据该时间戳做快照读。
2、所有参与运算的函数应可以下推到tikv执行,且在谓词条件中的函数才可下推,selection中函数须配合聚合函数完成下推,总体方式为聚合+selection函数下推方式执行,避免tikv向tidb发送大量数据。
3、采用sum方式进行聚合,利用crc32将所有非整数类型字段转为整数,整行相加,为避免最终结果集溢出,每行除以一个系数确保最终聚合后的数据不会发生溢出。
4、聚合后得到当前表的"checksum"值,该值碰撞的概率极低,可以用于比较两张表的数据是否一致。
注意:扫描数据时只用tikv引擎,不用tiflash引擎,参考:https://docs.pingcap.com/zh/tidb/stable/tiflash-compatibility

通过下属方法来判断库中所有非系统表的checksum值(快照时间点自行处理)
另外,注意当tikv节点较少时可以适当减少tidb_distsql_scan_concurrency并发度(默认15),可以设置为5,减少tikv的CPU压力,避免对其它交易产生影响。

#生成所有非系统表做checksum的查询语句
set session group_concat_max_len = 10000000;
with a(table_schema,table_name,select_fields) as (
select table_schema,table_name,group_concat(
    case
        when data_type in('bigint','int','tinyint') then column_name
        else concat('crc32(ifnull(\`', column_name, '\`, \'\'))')
    end
    order by ordinal_position separator '+')  as col1
from information_schema.columns where table_schema not in ('mysql','INFORMATION_SCHEMA','METRICS_SCHEMA','PERFORMANCE_SCHEMA') and (table_schema,table_name) in (select table_schema,table_name from information_schema.tables where table_type = 'BASE TABLE') group by table_schema,table_name order by table_schema,table_name)
select concat('select SQL_NO_CACHE ''',table_schema,'.',table_name,''' as table_name,sum( (',select_fields,') / 10000) as checksum from \`',table_schema,'\`.\`',table_name,'\`;') as query from a;

测试样例:

mysql> create table customer_bak like customer;
Query OK, 0 rows affected (0.10 sec)

mysql> batch on C_CUSTKEY limit 1000 insert into customer_bak select * from customer;
+----------------+---------------+
| number of jobs | job status    |
+----------------+---------------+
|            150 | all succeeded |
+----------------+---------------+
1 row in set (2.87 sec)
Records: 1000  Duplicates: 0  Warnings: 0

#查询customer,customer_bak这两张表的checksum的query语句
mysql> set session group_concat_max_len = 10000000;
Query OK, 0 rows affected (0.00 sec)

mysql> with a(table_schema,table_name,select_fields) as (
    -> select table_schema,table_name,group_concat(
    ->     case
    ->         when data_type in('bigint','int','tinyint') then column_name
    ->         else concat('crc32(ifnull(', column_name, ', \'\'))')
    ->     end
    ->     order by ordinal_position separator '+')  as col1
    -> from information_schema.columns where table_schema not in ('mysql','INFORMATION_SCHEMA','METRICS_SCHEMA','PERFORMANCE_SCHEMA') group by table_schema,table_name order by table_schema,table_name)
    -> select concat('select ''',table_schema,'.',table_name,''' as table_name,sum( (',select_fields,') / 10000) as checksum from ',table_schema,'.',table_name,';') as query from a
    -> where table_schema='tpch1' and table_name in ('customer','customer_bak');
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| query                                                                                                                                                                                                                                                                                            |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| select 'tpch1.customer' as table_name,sum( (C_CUSTKEY+crc32(ifnull(C_NAME, ''))+crc32(ifnull(C_ADDRESS, ''))+C_NATIONKEY+crc32(ifnull(C_PHONE, ''))+crc32(ifnull(C_ACCTBAL, ''))+crc32(ifnull(C_MKTSEGMENT, ''))+crc32(ifnull(C_COMMENT, ''))) / 10000) as checksum from tpch1.customer;         |
| select 'tpch1.customer_bak' as table_name,sum( (C_CUSTKEY+crc32(ifnull(C_NAME, ''))+crc32(ifnull(C_ADDRESS, ''))+C_NATIONKEY+crc32(ifnull(C_PHONE, ''))+crc32(ifnull(C_ACCTBAL, ''))+crc32(ifnull(C_MKTSEGMENT, ''))+crc32(ifnull(C_COMMENT, ''))) / 10000) as checksum from tpch1.customer_bak; |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

#适当减少tikv扫描并发度
mysql> set tidb_distsql_scan_concurrency = 5;
Query OK, 0 rows affected (0.00 sec)

#只用tikv引擎,不用tiflash引擎,参考:https://docs.pingcap.com/zh/tidb/stable/tiflash-compatibility
mysql> set tidb_isolation_read_engines='tikv';
Query OK, 0 rows affected (0.00 sec)
#查询tpch1.customer表的checksum值
mysql> select 'tpch1.customer' as table_name,sum( (C_CUSTKEY+crc32(ifnull(C_NAME, ''))+crc32(ifnull(C_ADDRESS, ''))+C_NATIONKEY+crc32(ifnull(C_PHONE, ''))+crc32(ifnull(C_ACCTBAL, ''))+crc32(ifnull(C_MKTSEGMENT, ''))+crc32(ifnull(C_COMMENT, ''))) / 10000) as checksum from tpch1.customer; 
+----------------+-------------------+
| table_name     | checksum          |
+----------------+-------------------+
| tpch1.customer | 180573846888.7747 |
+----------------+-------------------+
1 row in set (0.00 sec)

#查询tpch1.customer_bak表的checksum值
mysql> select 'tpch1.customer_bak' as table_name,sum( (C_CUSTKEY+crc32(ifnull(C_NAME, ''))+crc32(ifnull(C_ADDRESS, ''))+C_NATIONKEY+crc32(ifnull(C_PHONE, ''))+crc32(ifnull(C_ACCTBAL, ''))+crc32(ifnull(C_MKTSEGMENT, ''))+crc32(ifnull(C_COMMENT, ''))) / 10000) as checksum from tpch1.customer_bak;
+--------------------+-------------------+
| table_name         | checksum          |
+--------------------+-------------------+
| tpch1.customer_bak | 180573846888.7747 |
+--------------------+-------------------+
1 row in set (0.21 sec)

#可以看到这两行表的checksum相同为:180573846888.7747,说明两张表数据一致。

#下面修改customer_bak表中的一个字符判断表是否还一致
mysql> select C_ADDRESS from customer_bak where c_custkey=1;
+-------------------+
| C_ADDRESS         |
+-------------------+
| IVhzIApeRb ot,c,E |
+-------------------+
1 row in set (0.00 sec)

#更新C_ADDRESS比原来减少一个字符
mysql> update tpch1.customer_bak set C_ADDRESS='IVhzIApeRb ot,c,' where c_custkey=1;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select 'tpch1.customer_bak' as table_name,sum( (C_CUSTKEY+crc32(ifnull(C_NAME, ''))+crc32(ifnull(C_ADDRESS, ''))+C_NATIONKEY+crc32(ifnull(C_PHONE, ''))+crc32(ifnull(C_ACCTBAL, ''))+crc32(ifnull(C_MKTSEGMENT, ''))+crc32(ifnull(C_COMMENT, ''))) / 10000) as checksum from tpch1.customer_bak;
+--------------------+-------------------+
| table_name         | checksum          |
+--------------------+-------------------+
| tpch1.customer_bak | 180573631763.3302 |
+--------------------+-------------------+
1 row in set (0.21 sec)

#可以看到checksum结果为180573631763.3302和原来的180573846888.7747不同,表数据发生了变化。

将一个时间转为TSO时间

#将时间转为tso
mysql> SELECT conv( concat( bin( unix_timestamp('2023-11-03 22:46:55.546') * 1000 ), '000000000000000001' ), 2, 10 ) as tso;
+--------------------+
| tso                |
+--------------------+
| 445388636958490625 |
+--------------------+
1 row in set (0.00 sec)
#将tso转成时间戳看是否准确:
mysql> select tidb_parse_tso(445388636958490625);
+------------------------------------+
| tidb_parse_tso(445388636958490625) |
+------------------------------------+
| 2023-11-03 22:46:55.546000         |
+------------------------------------+
1 row in set (0.01 sec)

mysql> SELECT conv( concat( bin( unix_timestamp(now()) * 1000 ), '000000000000000001' ), 2, 10 ) as tso;
+--------------------+
| tso                |
+--------------------+
| 445409348550656001 |
+--------------------+
1 row in set (0.01 sec)