为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。
- 【TiDB 版本】:5.7.25-TiDB-v4.0.2
- 【问题描述】:在运行一个大的操作时抛了这个异常,怎么解决?
这是相关的 tidb 日志:
链接: https://pan.baidu.com/s/1WyIvJJR6tCtp9Nnz5GLPIQ 提取码: wvfv
若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。
为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。
这是相关的 tidb 日志:
链接: https://pan.baidu.com/s/1WyIvJJR6tCtp9Nnz5GLPIQ 提取码: wvfv
若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。
hi,云盘下载较慢,可否简述下问题?执行什么操作了呢
这是运行的 python脚本 pro_stat_apply_repay_datail.py (31.9 KB)
从异常信息来看,是在执行 db.create_table_as_select("ods.tmp_dwd_apply_overdue_detail_level_2" ...
的时候报错的。
python 的报错是否有呢
确定下在 tidb 中执行的 sql 是 create table as select * from tbl;
如果是,辛苦反馈下 tbl 的表结构
TiDB 不是不支持 create table .. as select * from tbl
吗,所以实际执行的是,
create view …view1 as select * from tbl;
describe …view1 => fields
create table …tab1 (fields)
insert into …tab1 select * from tbl;
tbl 的表结构在那个脚本里就有。
至于 python 的报错,没有当时的截屏了,不过源头的异常就是 1105, ‘bytes.Buffer: truncation out of range’
。它来自 TiDB。
找到那段报错信息了,你看下,
INFO: [2020-08-04 21:36:32.824341] [1596499200 /ods.pro_stat_apply_repay_datail] - [step 106] execute >>
drop table if exists ods.tmp_dwd_apply_overdue_detail_level_2
INFO: [2020-08-04 21:36:35.009799] [1596499200 /ods.pro_stat_apply_repay_datail] - [step 107] create_table_as_select >>
SELECT
apply_sid
,user_sid
,product_cid
,GROUP_CONCAT(case when repay_date > 0 and repay_date < current_date then period_id end ORDER BY repay_date,period_id) cur_period_id
,overdue_state
,cur_overdue_state
,max(overdue_days) max_overdue_days
,case when count(case when overdue_state = '逾期' then 1 end) > 0 then count(case when overdue_state = '逾期' then 1 end) else null end overdue_periods
,max(overdue_date) last_overdue_date
,max(overdue_repay_date) last_overdue_repay_date
,max(case when overdue_state = '逾期' then repay_amt end) max_overdue_amt
,sum(case when overdue_state = '逾期' then repay_amt end) overdue_amt_t
,sum(case when cur_overdue_state = '逾期' then repay_amt end) cur_overdue_amt_t
,max(case when repay_flag = 1 then repay_amt end) max_repay_amt
,sum(case when repay_flag = 2 then repay_amt end) balance
,sum(repay_amt) repay_amt_t
,sum(case when DATE_FORMAT(repay_date,'%Y%m') = DATE_FORMAT(DATE_ADD(current_date,INTERVAL -1 day),'%Y%m') and apply_state not regexp '清' then repay_amt end) cur_repay_amt_m
FROM
(
SELECT
apply_sid
,user_sid
,product_cid
,period_id
,repay_date
,overdue_state
,case when cur_state_flag = 1 then '逾期'
when cur_state_flag = 2 then '正常'
when overdue_state in('正常','未知') then overdue_state
else null
end cur_overdue_state
,case when overdue_state = '逾期' and is_exist = 2 and cur_state_flag = 1 then overdue_days
when overdue_state = '逾期' and is_exist = 1 then overdue_days
-- when overdue_state = '逾期' then '不详'
else null
end overdue_days
,case when overdue_state = '逾期' and is_exist in(1,2,4) then repay_date
-- when overdue_state = '逾期' then '不详'
else null
end overdue_date
,case when overdue_state = '逾期' and is_exist = 1 and cur_state_flag = 2 then real_repay_date
-- when overdue_state = '逾期' then '不详'
else null
end overdue_repay_date
,repay_amt
,repay_flag
,apply_state
FROM ods.tmp_dwd_apply_overdue_detail_level_1 a
ORDER BY apply_sid,case when overdue_state = '未知' then 1 when overdue_state = '逾期' then 2 when overdue_state = '正常' then 3 when overdue_state is null then 99 end,
case when cur_overdue_state = '未知' then 1 when cur_overdue_state = '逾期' then 2 when cur_overdue_state = '正常' then 3 when cur_overdue_state is null then 99 end
) a
GROUP BY apply_sid
==> ods.tmp_dwd_apply_overdue_detail_level_2
INFO: [2020-08-04 21:41:54.695424] [1596499200 /ods.pro_stat_apply_repay_datail] - Traceback (most recent call last):
File “/usr/local/lib/python3.6/site-packages/tidb_util-1.0-py3.6.egg/tidb/tidb_agent.py”, line 168, in create_table_as_select
tidb_utils.create_table_as_select(self.cnx, _tablename, _select_sql, ct_mappings=ct_mappings, logger=self.logger)
File “/usr/local/lib/python3.6/site-packages/tidb_util-1.0-py3.6.egg/tidb/tidb_utils.py”, line 458, in create_table_as_select
raise e
File “/usr/local/lib/python3.6/site-packages/tidb_util-1.0-py3.6.egg/tidb/tidb_utils.py”, line 454, in create_table_as_select
cursor.execute(sql)
File “/usr/local/lib/python3.6/site-packages/PyMySQL-0.9.3-py3.6.egg/pymysql/cursors.py”, line 170, in execute
result = self._query(query)
File “/usr/local/lib/python3.6/site-packages/PyMySQL-0.9.3-py3.6.egg/pymysql/cursors.py”, line 328, in _query
conn.query(q)
File “/usr/local/lib/python3.6/site-packages/PyMySQL-0.9.3-py3.6.egg/pymysql/connections.py”, line 517, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
File “/usr/local/lib/python3.6/site-packages/PyMySQL-0.9.3-py3.6.egg/pymysql/connections.py”, line 732, in _read_query_result
result.read()
File “/usr/local/lib/python3.6/site-packages/PyMySQL-0.9.3-py3.6.egg/pymysql/connections.py”, line 1075, in read
first_packet = self.connection._read_packet()
File “/usr/local/lib/python3.6/site-packages/PyMySQL-0.9.3-py3.6.egg/pymysql/connections.py”, line 684, in _read_packet
packet.check_error()
File “/usr/local/lib/python3.6/site-packages/PyMySQL-0.9.3-py3.6.egg/pymysql/protocol.py”, line 220, in check_error
err.raise_mysql_exception(self._data)
File “/usr/local/lib/python3.6/site-packages/PyMySQL-0.9.3-py3.6.egg/pymysql/err.py”, line 109, in raise_mysql_exception
raise errorclass(errno, errval)
pymysql.err.InternalError: (1105, ‘bytes.Buffer: truncation out of range’)
NOTICE: Got exception WHEN calling ods.pro_stat_apply_repay_datail(1596499200):
state : 38000
message: pymysql.err.InternalError: (1105, ‘bytes.Buffer: truncation out of range’)
context: Traceback (most recent call last):
PL/Python function “pro_stat_apply_repay_datail”, line 522, in
db.done(e)
PL/Python function “pro_stat_apply_repay_datail”, line 62, in done
PL/Python function “pro_stat_apply_repay_datail”, line 399, in __plpython_procedure_pro_stat_apply_repay_datail_567412
“”", ct_mappings={“cur_period_id”:“text”})
PL/Python function “pro_stat_apply_repay_datail”, line 172, in create_table_as_select
PL/Python function “pro_stat_apply_repay_datail”, line 167, in create_table_as_select
PL/Python function “pro_stat_apply_repay_datail”, line 457, in create_table_as_select
PL/Python function “pro_stat_apply_repay_datail”, line 453, in create_table_as_select
PL/Python function “pro_stat_apply_repay_datail”, line 169, in execute
PL/Python function “pro_stat_apply_repay_datail”, line 327, in _query
PL/Python function “pro_stat_apply_repay_datail”, line 516, in query
PL/Python function “pro_stat_apply_repay_datail”, line 731, in _read_query_result
PL/Python function “pro_stat_apply_repay_datail”, line 1074, in read
PL/Python function “pro_stat_apply_repay_datail”, line 683, in _read_packet
PL/Python function “pro_stat_apply_repay_datail”, line 219, in check_error
PL/Python function “pro_stat_apply_repay_datail”, line 108, in raise_mysql_exception
PL/Python procedure “pro_stat_apply_repay_datail”
SQL statement “call ods.pro_stat_apply_repay_datail($1)”
PL/pgSQL function prod_sch_worker(integer) line 45 at EXECUTE
准确的说,那个异常发生在上述 insert into ..tab1 select * from tbl
的环节。
什么原因可能导致这个异常呢?
会是数据类型不匹配吗?
试了几次,看起来不是数据类型/长度不匹配的问题。
这是最后一次的日志:pro_stat_apply_repay_datail.log (35.4 KB)
那个 create table .. as select * from tbl
在建表环节的语句已经变成这样了,
create table ods.tmp_dwd_apply_overdue_detail_level_2 (
apply_sid
int(11),
user_sid
int(11),
product_cid
int(11),
cur_period_id
text,
overdue_state
text,
cur_overdue_state
text,
max_overdue_days
bigint(20),
overdue_periods
bigint(21),
last_overdue_date
text,
last_overdue_repay_date
text,
max_overdue_amt
double,
overdue_amt_t
decimal(65,2),
cur_overdue_amt_t
decimal(65,2),
max_repay_amt
double,
balance
decimal(65,2),
repay_amt_t
decimal(65,2),
cur_repay_amt_m
decimal(65,2))
已经找不到能产生溢出的地方了。
还有什么原因可能导致这个异常呢?
func (h *topNRows) concat(sep string, truncated bool) string {
buffer := new(bytes.Buffer)
sort.Sort(sort.Reverse(h))
for i, row := range h.rows {
if i != 0 {
buffer.WriteString(sep)
}
buffer.Write(row.buffer.Bytes())
}
if truncated && uint64(buffer.Len()) < h.limitSize {
// append the last separator, because the last separator may be truncated in tryToAdd.
buffer.WriteString(sep)
buffer.Truncate(int(h.limitSize)) // 这个地方 buffer 没有符合 limitSize 的长度,所以报了 runtime 的错误
}
return buffer.String()
}
这个问题比较少遇到,看起来是比较新的代码
不是写入的问题,单单执行这个SQL,就会抛那个异常
SELECT
apply_sid
,user_sid
,product_cid
,GROUP_CONCAT(case when repay_date > 0 and repay_date < current_date then period_id end ORDER BY repay_date,period_id) cur_period_id
,overdue_state
,cur_overdue_state
,max(overdue_days) max_overdue_days
,case when count(case when overdue_state = ‘逾期’ then 1 end) > 0 then count(case when overdue_state = ‘逾期’ then 1 end) else null end overdue_periods
,max(overdue_date) last_overdue_date
,max(overdue_repay_date) last_overdue_repay_date
,max(case when overdue_state = ‘逾期’ then repay_amt end) max_overdue_amt
,sum(case when overdue_state = ‘逾期’ then repay_amt end) overdue_amt_t
,sum(case when cur_overdue_state = ‘逾期’ then repay_amt end) cur_overdue_amt_t
,max(case when repay_flag = 1 then repay_amt end) max_repay_amt
,sum(case when repay_flag = 2 then repay_amt end) balance
,sum(repay_amt) repay_amt_t
,sum(case when DATE_FORMAT(repay_date,‘%Y%m’) = DATE_FORMAT(DATE_ADD(current_date,INTERVAL -1 day),‘%Y%m’) and apply_state not regexp ‘清’ then repay_amt end) cur_repay_amt_m
FROM
(
SELECT
apply_sid
,user_sid
,product_cid
,period_id
,repay_date
,overdue_state
,case when cur_state_flag = 1 then ‘逾期’
when cur_state_flag = 2 then ‘正常’
when overdue_state in(‘正常’,‘未知’) then overdue_state
else null
end cur_overdue_state
,case when overdue_state = ‘逾期’ and is_exist = 2 and cur_state_flag = 1 then overdue_days
when overdue_state = ‘逾期’ and is_exist = 1 then overdue_days
– when overdue_state = ‘逾期’ then ‘不详’
else null
end overdue_days
,case when overdue_state = ‘逾期’ and is_exist in(1,2,4) then repay_date
– when overdue_state = ‘逾期’ then ‘不详’
else null
end overdue_date
,case when overdue_state = ‘逾期’ and is_exist = 1 and cur_state_flag = 2 then real_repay_date
– when overdue_state = ‘逾期’ then ‘不详’
else null
end overdue_repay_date
,repay_amt
,repay_flag
,apply_state
FROM ods.tmp_dwd_apply_overdue_detail_level_1 a
ORDER BY apply_sid,case when overdue_state = ‘未知’ then 1 when overdue_state = ‘逾期’ then 2 when overdue_state = ‘正常’ then 3 when overdue_state is null then 99 end,
case when cur_overdue_state = ‘未知’ then 1 when cur_overdue_state = ‘逾期’ then 2 when cur_overdue_state = ‘正常’ then 3 when cur_overdue_state is null then 99 end
) a
GROUP BY apply_sid
;
更精确的定位,是在 GROUP_CONCAT(case when repay_date > 0 and repay_date < current_date then period_id end ORDER BY repay_date,period_id) cur_period_id
这个地方。
我们本意只是取符合条件的第一个 period_id 啊
研发同事还在查看,这个问题可能时间需要的多一些
补充点信息,我昨天下午又做了点尝试,把
GROUP_CONCAT(case when repay_date > 0 and repay_date < current_date then period_id end ORDER BY repay_date,period_id) cur_period_id
改成
GROUP_CONCAT(distinct (case when repay_date > 0 and repay_date < current_date then period_id end) ORDER BY repay_date,period_id) cur_period_id
这一改动让我逃过一劫,执行成功了。
看来还是 GROUP_CONCAT
产生的字符串超出了 TiDB的预期造成的。
在我们的业务场景下,加了 distinct
拼接项能控制在 10几个,不加则可能过百个。拼接项一般是 1~12 之间的一个数字。
感谢反馈,我们内部有更新进度,也会及时同步,感谢理解。
可以查一下出问题的每次concat的数量吗,本地试了一些case没有复现这个现象。
还有就是,之前有设置过 group_concat_max_len
这个变量的值吗
没有设置过这个变量。刚看了下当前值是 1024,这个值是字节数吗
是截断的字符长度,就是超过这个长度之后的会截断。
了解了,谢谢!