这个是什么的 SQL 补充一下到原回复里面,我们下周准备整理成一个文章,方便各位 TiDBer 使用。
这个是什么的 SQL 补充一下到原回复里面,我们下周准备整理成一个文章,方便各位 TiDBer 使用。
这个是什么的 SQL 补充一下到原回复里面,我们下周准备整理成一个文章,方便各位 TiDBer 使用。
再来一个:
读取历史数据
使用 AS OF TIMESTAMP 语法读取历史数据
可以通过以下三种方式使用 AS OF TIMESTAMP 语法:
SELECT … FROM … AS OF TIMESTAMP
START TRANSACTION READ ONLY AS OF TIMESTAMP
SET TRANSACTION READ ONLY AS OF TIMESTAMP
select * from t as of timestamp '2021-05-26 16:45:26';
start transaction read only as of timestamp '2021-05-26 16:45:26';
set transaction read only as of timestamp '2021-05-26 16:45:26';
通过系统变量 tidb_read_staleness 读取历史数据
–从 5 秒前至现在的时间范围内选择一个尽可能新的时间戳
set @@tidb_read_staleness="-5";
通过系统变量 tidb_snapshot 读取历史数据
–设置一个特殊的环境变量,这个是一个 session scope 的变量,其意义为读取这个时间之前的最新的一个版本
set @@tidb_snapshot="2016-10-08 16:45:26";
–清空这个变量后,即可读取最新版本数据
set @@tidb_snapshot=“”;
通过SQL查询所有节点所在OS的CPU当前使用率:
SELECT
b.time,
a.hostname,
a.ip,
a.types,
b.cpu_used_percent
FROM
(
SELECT
GROUP_CONCAT(TYPE) AS TYPES,
SUBSTRING_INDEX(instance, ':', 1) AS ip,
value AS hostname
FROM
information_schema.cluster_systeminfo
WHERE
name = 'kernel.hostname'
GROUP BY
ip,
hostname
) a,
(
SELECT
time,
SUBSTRING_INDEX(instance, ':', 1) AS ip,
(100 - value) AS cpu_used_percent
FROM
metrics_schema.node_cpu_usage
WHERE
MODE = 'idle'
AND time = NOW()
) b
WHERE
a.ip = b.ip
输出示例:
+----------------------------+-----------------------+----------------+----------------------+--------------------+
| time | hostname | ip | types | cpu_used_percent |
+----------------------------+-----------------------+----------------+----------------------+--------------------+
| 2023-01-10 22:40:15.000000 | localhost.localdomain | 192.168.31.201 | tidb,pd,tikv,tiflash | 11.438079153798114 |
+----------------------------+-----------------------+----------------+----------------------+--------------------+
1 row in set (0.04 sec)
说明:我这里所有类型组件只创建了有一个而且都在一个os上,所以只显示了一行。
这是我实践中清理tidb大量数据的脚本,这样可以实现删除百万级别以上的数据,而且不影响tidb正常使用,希望能帮助到你
date1=date --date "7 days ago" +"%Y-%m-%d"
delete_db_sql=“delete from mysql_table where create_date_time<‘$date1’ limit 10000”
i=0
while ((++i)); do
a=/bin/mysql -uroot -p123456 -A mysql_database -h127.0.0.1 --comments -e "${delete_db_sql}" -vvv|grep "Query OK" |awk '{print $3}'
if(($a<1)); then
break 1
fi
sleep 1
printf “%-4d” $((i))
done
开启 Follower Read
SET [GLOBAL] tidb_replica_read = ‘follower’;
可以将变量 tidb_replica_read
的值(默认为 leader
)设置为 follower
、leader-and-follower
、prefer-leader
、closest-replicas
或 closest-adaptive
尽量减少系统影响的情况下比较两张表的数据是否一致
通过cdc同步数据时,比较上下游两张表的数据是否一致可以选择用sync-diff-inspector,但是该工具会将所有记录抽取出来进行比较,如此一来:
1、如果上下游数据不在一个地域,那么抽取数据进行数据比较时会占用大量网络带宽,影响其它同步任务。
2、会全表数据从tikv中读取到tidb中,可能会导致其它TP交易性能抖动,尤其遇到大宽表和带有大对象类型表时,更加显著。
ticdc同步数据正常情况下上下游数据是一致的,也就是可能存在不同步的表概率很小,那么我们可以预先判断上下游表是否一致,如果一致则不需要处理,如果不一致则进一步采用其它手段进行处理。
这里我们寻找一个方法来比较两张表记录是否一致。
1、根据cdc同步情况,获取一个时间戳(下游完成了时间点的回放,参考:ticdc的Syncpoint 功能),并根据该时间戳做快照读。
2、所有参与运算的函数应可以下推到tikv执行,且在谓词条件中的函数才可下推,selection中函数须配合聚合函数完成下推,总体方式为聚合+selection函数下推方式执行,避免tikv向tidb发送大量数据。
3、采用sum方式进行聚合,利用crc32将所有非整数类型字段转为整数,整行相加,为避免最终结果集溢出,每行除以一个系数确保最终聚合后的数据不会发生溢出。
4、聚合后得到当前表的"checksum"值,该值碰撞的概率极低,可以用于比较两张表的数据是否一致。
注意:扫描数据时只用tikv引擎,不用tiflash引擎,参考:https://docs.pingcap.com/zh/tidb/stable/tiflash-compatibility
通过下属方法来判断库中所有非系统表的checksum值(快照时间点自行处理)
另外,注意当tikv节点较少时可以适当减少tidb_distsql_scan_concurrency并发度(默认15),可以设置为5,减少tikv的CPU压力,避免对其它交易产生影响。
#生成所有非系统表做checksum的查询语句
set session group_concat_max_len = 10000000;
with a(table_schema,table_name,select_fields) as (
select table_schema,table_name,group_concat(
case
when data_type in('bigint','int','tinyint') then column_name
else concat('crc32(ifnull(\`', column_name, '\`, \'\'))')
end
order by ordinal_position separator '+') as col1
from information_schema.columns where table_schema not in ('mysql','INFORMATION_SCHEMA','METRICS_SCHEMA','PERFORMANCE_SCHEMA') and (table_schema,table_name) in (select table_schema,table_name from information_schema.tables where table_type = 'BASE TABLE') group by table_schema,table_name order by table_schema,table_name)
select concat('select SQL_NO_CACHE ''',table_schema,'.',table_name,''' as table_name,sum( (',select_fields,') / 10000) as checksum from \`',table_schema,'\`.\`',table_name,'\`;') as query from a;
测试样例:
mysql> create table customer_bak like customer;
Query OK, 0 rows affected (0.10 sec)
mysql> batch on C_CUSTKEY limit 1000 insert into customer_bak select * from customer;
+----------------+---------------+
| number of jobs | job status |
+----------------+---------------+
| 150 | all succeeded |
+----------------+---------------+
1 row in set (2.87 sec)
Records: 1000 Duplicates: 0 Warnings: 0
#查询customer,customer_bak这两张表的checksum的query语句
mysql> set session group_concat_max_len = 10000000;
Query OK, 0 rows affected (0.00 sec)
mysql> with a(table_schema,table_name,select_fields) as (
-> select table_schema,table_name,group_concat(
-> case
-> when data_type in('bigint','int','tinyint') then column_name
-> else concat('crc32(ifnull(', column_name, ', \'\'))')
-> end
-> order by ordinal_position separator '+') as col1
-> from information_schema.columns where table_schema not in ('mysql','INFORMATION_SCHEMA','METRICS_SCHEMA','PERFORMANCE_SCHEMA') group by table_schema,table_name order by table_schema,table_name)
-> select concat('select ''',table_schema,'.',table_name,''' as table_name,sum( (',select_fields,') / 10000) as checksum from ',table_schema,'.',table_name,';') as query from a
-> where table_schema='tpch1' and table_name in ('customer','customer_bak');
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| query |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| select 'tpch1.customer' as table_name,sum( (C_CUSTKEY+crc32(ifnull(C_NAME, ''))+crc32(ifnull(C_ADDRESS, ''))+C_NATIONKEY+crc32(ifnull(C_PHONE, ''))+crc32(ifnull(C_ACCTBAL, ''))+crc32(ifnull(C_MKTSEGMENT, ''))+crc32(ifnull(C_COMMENT, ''))) / 10000) as checksum from tpch1.customer; |
| select 'tpch1.customer_bak' as table_name,sum( (C_CUSTKEY+crc32(ifnull(C_NAME, ''))+crc32(ifnull(C_ADDRESS, ''))+C_NATIONKEY+crc32(ifnull(C_PHONE, ''))+crc32(ifnull(C_ACCTBAL, ''))+crc32(ifnull(C_MKTSEGMENT, ''))+crc32(ifnull(C_COMMENT, ''))) / 10000) as checksum from tpch1.customer_bak; |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
#适当减少tikv扫描并发度
mysql> set tidb_distsql_scan_concurrency = 5;
Query OK, 0 rows affected (0.00 sec)
#只用tikv引擎,不用tiflash引擎,参考:https://docs.pingcap.com/zh/tidb/stable/tiflash-compatibility
mysql> set tidb_isolation_read_engines='tikv';
Query OK, 0 rows affected (0.00 sec)
#查询tpch1.customer表的checksum值
mysql> select 'tpch1.customer' as table_name,sum( (C_CUSTKEY+crc32(ifnull(C_NAME, ''))+crc32(ifnull(C_ADDRESS, ''))+C_NATIONKEY+crc32(ifnull(C_PHONE, ''))+crc32(ifnull(C_ACCTBAL, ''))+crc32(ifnull(C_MKTSEGMENT, ''))+crc32(ifnull(C_COMMENT, ''))) / 10000) as checksum from tpch1.customer;
+----------------+-------------------+
| table_name | checksum |
+----------------+-------------------+
| tpch1.customer | 180573846888.7747 |
+----------------+-------------------+
1 row in set (0.00 sec)
#查询tpch1.customer_bak表的checksum值
mysql> select 'tpch1.customer_bak' as table_name,sum( (C_CUSTKEY+crc32(ifnull(C_NAME, ''))+crc32(ifnull(C_ADDRESS, ''))+C_NATIONKEY+crc32(ifnull(C_PHONE, ''))+crc32(ifnull(C_ACCTBAL, ''))+crc32(ifnull(C_MKTSEGMENT, ''))+crc32(ifnull(C_COMMENT, ''))) / 10000) as checksum from tpch1.customer_bak;
+--------------------+-------------------+
| table_name | checksum |
+--------------------+-------------------+
| tpch1.customer_bak | 180573846888.7747 |
+--------------------+-------------------+
1 row in set (0.21 sec)
#可以看到这两行表的checksum相同为:180573846888.7747,说明两张表数据一致。
#下面修改customer_bak表中的一个字符判断表是否还一致
mysql> select C_ADDRESS from customer_bak where c_custkey=1;
+-------------------+
| C_ADDRESS |
+-------------------+
| IVhzIApeRb ot,c,E |
+-------------------+
1 row in set (0.00 sec)
#更新C_ADDRESS比原来减少一个字符
mysql> update tpch1.customer_bak set C_ADDRESS='IVhzIApeRb ot,c,' where c_custkey=1;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select 'tpch1.customer_bak' as table_name,sum( (C_CUSTKEY+crc32(ifnull(C_NAME, ''))+crc32(ifnull(C_ADDRESS, ''))+C_NATIONKEY+crc32(ifnull(C_PHONE, ''))+crc32(ifnull(C_ACCTBAL, ''))+crc32(ifnull(C_MKTSEGMENT, ''))+crc32(ifnull(C_COMMENT, ''))) / 10000) as checksum from tpch1.customer_bak;
+--------------------+-------------------+
| table_name | checksum |
+--------------------+-------------------+
| tpch1.customer_bak | 180573631763.3302 |
+--------------------+-------------------+
1 row in set (0.21 sec)
#可以看到checksum结果为180573631763.3302和原来的180573846888.7747不同,表数据发生了变化。
将一个时间转为TSO时间
#将时间转为tso
mysql> SELECT conv( concat( bin( unix_timestamp('2023-11-03 22:46:55.546') * 1000 ), '000000000000000001' ), 2, 10 ) as tso;
+--------------------+
| tso |
+--------------------+
| 445388636958490625 |
+--------------------+
1 row in set (0.00 sec)
#将tso转成时间戳看是否准确:
mysql> select tidb_parse_tso(445388636958490625);
+------------------------------------+
| tidb_parse_tso(445388636958490625) |
+------------------------------------+
| 2023-11-03 22:46:55.546000 |
+------------------------------------+
1 row in set (0.01 sec)
mysql> SELECT conv( concat( bin( unix_timestamp(now()) * 1000 ), '000000000000000001' ), 2, 10 ) as tso;
+--------------------+
| tso |
+--------------------+
| 445409348550656001 |
+--------------------+
1 row in set (0.01 sec)