Hive metastore 元数据库使用TiDB

背景: 目前hive的元数据库使用的是mysql,总行数已经快达到4亿,单表数据行数超过1亿,而且数据量还在不断增加,hive metastore查询mysql延迟很高,性能达到瓶颈。

问题

  • 在这种场景下,TiDB是否合适做hive的元数据存储? 之前有没有这样的使用案例?
  • 我使用docker compose做了下简单测试,hive metastore连接TiDB是没有问题的,建立库也是OK的,但是建表的时候发生这样的错误:
 2019-11-18T10:47:15,520 ERROR [pool-9-thread-4] thrift.ProcessFunction: Internal error processing lock
org.apache.hadoop.hive.metastore.api.MetaException: Unable to update transaction database com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your TiDB version for the right syntax to use line 1 column 9 near "SAVEPOINT `__7f378d91_16e7c5aa880__7ffa`"
  • hive建表测试语句如下,该建表语句在元数据使用mysql下是能够建表成功的。
CREATE TABLE `dam.inn_part_table`(
  `time` bigint,
  `cmd` string,
  `ugi` string,
  `src` string,
  `ip` string COMMENT 'IP Address of the User',
  `dst` string,
  `cluster` string)
PARTITIONED BY (
  `day` string,
  `hour` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'field.delim'='	',
  'serialization.format'='	')
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://hadoop-test/dam/warehouse/inn_part_table';

TiDB 适用于大数据量,高并发,对可用性要求较高的场景。hive 的元数据使用 mysql 保存就足够了。

现在我们的mysql使用的是阿里的RDS,机器配置是32C128G,前段时间出现过好几次查询阻塞延时,数据库CPU飙升的近100%,在并发大查询下RDS已经扛不住了。大佬有什么解决方案吗,TiDB适不适合我们的这种case?

mysql 撑不住的话,可以考虑用 tidb,上面 ddl 中有 TiDB 不支持的语法,看上去也不是 MySQL 语法。 能抓到建表时 MySQL 所接收到的语句吗?

上面粘贴的是HIve 的建表语句,hive建新表时metastore server会将建表信息解析并插入元数据库中,这是应该是往TiDB插入元数据时发生了语法错误。 TiDB有审计功能吗,我现在是docker-compose部署的,进入TiDB容器内部哪里可以看到SQL执行历史日志?

我把hive的元数据库改回到mysql,捕捉到的执行log如下:

| 2019-11-18 13:33:59.265732 | root[root] @ qdc-qtt-bigdata-emrtest-02 [10.0.2.46] |       772 |         0 | Query        | INSERT INTO `TBLS` (`TBL_ID`,`CREATE_TIME`,`DB_ID`,`LAST_ACCESS_TIME`,`OWNER`,`RETENTION`,`IS_REWRITE_ENABLED`,`SD_ID`,`TBL_NAME`,`TBL_TYPE`,`VIEW_EXPANDED_TEXT`,`VIEW_ORIGINAL_TEXT`) VALUES (7,1574055239,11,0,'root',0,0,7,'inn_part_table','MANAGED_TABLE',null,null)                                                                                                     |
| 2019-11-18 13:33:59.265301 | root[root] @ qdc-qtt-bigdata-emrtest-02 [10.0.2.46] |       772 |         0 | Query        | INSERT INTO `SDS` (`SD_ID`,`CD_ID`,`INPUT_FORMAT`,`IS_COMPRESSED`,`IS_STOREDASSUBDIRECTORIES`,`LOCATION`,`NUM_BUCKETS`,`OUTPUT_FORMAT`,`SERDE_ID`) VALUES (7,7,'org.apache.hadoop.mapred.TextInputFormat',0,0,'hdfs://hadoop-test/dam/warehouse/inn_part_table',-1,'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat',7)                                             |
| 2019-11-18 13:33:59.215706 | root[root] @ qdc-qtt-bigdata-emrtest-02 [10.0.2.46] |       772 |         0 | Query        | SELECT DISTINCT 'org.apache.hadoop.hive.metastore.model.MTable' AS `NUCLEUS_TYPE`,`A0`.`CREATE_TIME`,`A0`.`LAST_ACCESS_TIME`,`A0`.`OWNER`,`A0`.`RETENTION`,`A0`.`IS_REWRITE_ENABLED`,`A0`.`TBL_NAME`,`A0`.`TBL_TYPE`,`A0`.`TBL_ID` FROM `TBLS` `A0` LEFT OUTER JOIN `DBS` `B0` ON `A0`.`DB_ID` = `B0`.`DB_ID` WHERE `A0`.`TBL_NAME` = 'inn_part_table' AND `B0`.`NAME` = 'dam' |
| 2019-11-18 13:33:59.271821 | root[root] @ qdc-qtt-bigdata-emrtest-02 [10.0.2.46] |       772 |         0 | Query        | INSERT INTO `TBL_PRIVS` (`TBL_GRANT_ID`,`CREATE_TIME`,`GRANT_OPTION`,`GRANTOR`,`GRANTOR_TYPE`,`PRINCIPAL_NAME`,`PRINCIPAL_TYPE`,`TBL_PRIV`,`TBL_ID`) VALUES (18,1574055239,1,'root','USER','root','USER','DELETE',7)                                                                                                                           |
| 2019-11-18 13:33:59.271336 | root[root] @ qdc-qtt-bigdata-emrtest-02 [10.0.2.46] |       772 |         0 | Query        | INSERT INTO `TBL_PRIVS` (`TBL_GRANT_ID`,`CREATE_TIME`,`GRANT_OPTION`,`GRANTOR`,`GRANTOR_TYPE`,`PRINCIPAL_NAME`,`PRINCIPAL_TYPE`,`TBL_PRIV`,`TBL_ID`) VALUES (17,1574055239,1,'root','USER','root','USER','UPDATE',7)                                                                                                                           |
| 2019-11-18 13:33:59.270776 | root[root] @ qdc-qtt-bigdata-emrtest-02 [10.0.2.46] |       772 |         0 | Query        | INSERT INTO `TBL_PRIVS` (`TBL_GRANT_ID`,`CREATE_TIME`,`GRANT_OPTION`,`GRANTOR`,`GRANTOR_TYPE`,`PRINCIPAL_NAME`,`PRINCIPAL_TYPE`,`TBL_PRIV`,`TBL_ID`) VALUES (16,1574055239,1,'root','USER','root','USER','SELECT',7)                                                                                                                           |
| 2019-11-18 13:33:59.267988 | root[root] @ qdc-qtt-bigdata-emrtest-02 [10.0.2.46] |       772 |         0 | Query        | INSERT INTO `TBL_PRIVS` (`TBL_GRANT_ID`,`CREATE_TIME`,`GRANT_OPTION`,`GRANTOR`,`GRANTOR_TYPE`,`PRINCIPAL_NAME`,`PRINCIPAL_TYPE`,`TBL_PRIV`,`TBL_ID`) VALUES (15,1574055239,1,'root','USER','root','USER','INSERT',7)                                                                                                                           |
| 2019-11-18 13:33:59.267256 | root[root] @ qdc-qtt-bigdata-emrtest-02 [10.0.2.46] |       772 |         0 | Query        | INSERT INTO `PARTITION_KEYS` (`TBL_ID`,`PKEY_COMMENT`,`PKEY_NAME`,`PKEY_TYPE`,`INTEGER_IDX`) VALUES (7,null,'hour','string',1)                                                                                                                                                                                                                 |
| 2019-11-18 13:33:59.266913 | root[root] @ qdc-qtt-bigdata-emrtest-02 [10.0.2.46] |       772 |         0 | Query        | INSERT INTO `PARTITION_KEYS` (`TBL_ID`,`PKEY_COMMENT`,`PKEY_NAME`,`PKEY_TYPE`,`INTEGER_IDX`) VALUES (7,null,'day','string',0)                                                                                                                                                                                                                  |
| 2019-11-18 13:33:59.266469 | root[root] @ qdc-qtt-bigdata-emrtest-02 [10.0.2.46] |       772 |         0 | Query        | INSERT INTO `TABLE_PARAMS` (`PARAM_VALUE`,`TBL_ID`,`PARAM_KEY`) VALUES ('1574055239',7,'transient_lastDdlTime')                                                                                                                                                                                                                                |
| 2019-11-18 13:33:59.265732 | root[root] @ qdc-qtt-bigdata-emrtest-02 [10.0.2.46] |       772 |         0 | Query        | INSERT INTO `TBLS` (`TBL_ID`,`CREATE_TIME`,`DB_ID`,`LAST_ACCESS_TIME`,`OWNER`,`RETENTION`,`IS_REWRITE_ENABLED`,`SD_ID`,`TBL_NAME`,`TBL_TYPE`,`VIEW_EXPANDED_TEXT`,`VIEW_ORIGINAL_TEXT`) VALUES (7,1574055239,11,0,'root',0,0,7,'inn_part_table','MANAGED_TABLE',null,null)                                                                     |
| 2019-11-18 13:33:59.265301 | root[root] @ qdc-qtt-bigdata-emrtest-02 [10.0.2.46] |       772 |         0 | Query        | INSERT INTO `SDS` (`SD_ID`,`CD_ID`,`INPUT_FORMAT`,`IS_COMPRESSED`,`IS_STOREDASSUBDIRECTORIES`,`LOCATION`,`NUM_BUCKETS`,`OUTPUT_FORMAT`,`SERDE_ID`) VALUES (7,7,'org.apache.hadoop.mapred.TextInputFormat',0,0,'hdfs://hadoop-test/dam/warehouse/inn_part_table',-1,'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat',7)             |
| 2019-11-18 13:33:59.265042 | root[root] @ qdc-qtt-bigdata-emrtest-02 [10.0.2.46] |       772 |         0 | Query        | INSERT INTO `COLUMNS_V2` (`CD_ID`,`COMMENT`,`COLUMN_NAME`,`TYPE_NAME`,`INTEGER_IDX`) VALUES (7,null,'cluster','string',6)                                                                                                                                                                                                                      |
| 2019-11-18 13:33:59.264720 | root[root] @ qdc-qtt-bigdata-emrtest-02 [10.0.2.46] |       772 |         0 | Query        | INSERT INTO `COLUMNS_V2` (`CD_ID`,`COMMENT`,`COLUMN_NAME`,`TYPE_NAME`,`INTEGER_IDX`) VALUES (7,null,'dst','string',5)                                                                                                                                                                                                                          |
| 2019-11-18 13:33:59.264459 | root[root] @ qdc-qtt-bigdata-emrtest-02 [10.0.2.46] |       772 |         0 | Query        | INSERT INTO `COLUMNS_V2` (`CD_ID`,`COMMENT`,`COLUMN_NAME`,`TYPE_NAME`,`INTEGER_IDX`) VALUES (7,'IP Address of the User','ip','string',4)                                                                                                                                                                                                       |
| 2019-11-18 13:33:59.264144 | root[root] @ qdc-qtt-bigdata-emrtest-02 [10.0.2.46] |       772 |         0 | Query        | INSERT INTO `COLUMNS_V2` (`CD_ID`,`COMMENT`,`COLUMN_NAME`,`TYPE_NAME`,`INTEGER_IDX`) VALUES (7,null,'src','string',3)                                                                                                                                                                                                                          |
| 2019-11-18 13:33:59.263791 | root[root] @ qdc-qtt-bigdata-emrtest-02 [10.0.2.46] |       772 |         0 | Query        | INSERT INTO `COLUMNS_V2` (`CD_ID`,`COMMENT`,`COLUMN_NAME`,`TYPE_NAME`,`INTEGER_IDX`) VALUES (7,null,'ugi','string',2)                                                                                                                                                                                                                          |
| 2019-11-18 13:33:59.263424 | root[root] @ qdc-qtt-bigdata-emrtest-02 [10.0.2.46] |       772 |         0 | Query        | INSERT INTO `COLUMNS_V2` (`CD_ID`,`COMMENT`,`COLUMN_NAME`,`TYPE_NAME`,`INTEGER_IDX`) VALUES (7,null,'cmd','string',1)                                                                                                                                                                                                                          |
| 2019-11-18 13:33:59.263058 | root[root] @ qdc-qtt-bigdata-emrtest-02 [10.0.2.46] |       772 |         0 | Query        | INSERT INTO `COLUMNS_V2` (`CD_ID`,`COMMENT`,`COLUMN_NAME`,`TYPE_NAME`,`INTEGER_IDX`) VALUES (7,null,'time','bigint',0)                                                                                                                                                                                                                         |
| 2019-11-18 13:33:59.262606 | root[root] @ qdc-qtt-bigdata-emrtest-02 [10.0.2.46] |       772 |         0 | Query        | INSERT INTO `CDS` (`CD_ID`) VALUES (7)                                                                                                                                                                                                                                                                                                         |
| 2019-11-18 13:33:59.262032 | root[root] @ qdc-qtt-bigdata-emrtest-02 [10.0.2.46] |       772 |         0 | Query        | INSERT INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`) VALUES ('	',7,'field.delim')                                                                                                                                                                                                                                                 |
| 2019-11-18 13:33:59.261775 | root[root] @ qdc-qtt-bigdata-emrtest-02 [10.0.2.46] |       772 |         0 | Query        | INSERT INTO `SERDE_PARAMS` (`PARAM_VALUE`,`SERDE_ID`,`PARAM_KEY`) VALUES ('	',7,'serialization.format')                                                                                                                                                                                                                                        |
| 2019-11-18 13:33:59.260703 | root[root] @ qdc-qtt-bigdata-emrtest-02 [10.0.2.46] |       772 |         0 | Query        | INSERT INTO `SERDES` (`SERDE_ID`,`NAME`,`SLIB`) VALUES (7,null,'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe')                                                                                                                                                                                                                           |
| 2019-11-18 13:33:59.201805 | root[root] @ qdc-qtt-bigdata-emrtest-02 [10.0.2.46] |       776 |         0 | Query        | insert into HIVE_LOCKS (hl_lock_ext_id, hl_lock_int_id, hl_txnid, hl_db, hl_table, hl_partition,hl_lock_state, hl_lock_type, hl_last_heartbeat, hl_user, hl_host, hl_agent_info) values(11, 1,0, 'dam', null, null, 'w', 'r', 1574055239000, 'root', 'qdc-qtt-bigdata-emrtest-02', 'root_20191118133359_3bfb0046-1378-4e2c-af66-beff3cd6a56d') |

可以打开 tidb_general_log 来收集完整的 sql 请求

设置了之后,并没有发现/usr/local/mysql/data/localhost.log文件,这个日志在哪里找到?

Hive建表mysql执行记录.log (148.0 KB) 这是以上hive建新表时,mysql的操作log,帮忙看下其中有哪些语句是TiDB不支持而导致报错的?

图上另外两个参数是假的,为了兼容 mysql 语法。 general log 被记录在 tidb.log 中,具有 [GENERAL_LOG] 关键字

好的,mysql完整的执行记录我已经上传了,大佬帮忙看下。 另外,tidb.log在什么位置,我使用的docker-compose,这个日志是在容器内还是容器外,我估计TiDB里的sql请求应该和我上传的是一致的,这些个请求是相同hive建表语句从Hive metastore发出的

tidb.log的日志我在docker容器内部找到了, 目前现象是hive客户端中执行create database xxx; show databases 都是可以的,但是执行create table 或者 show tables 都会报错,报错截图如下:

[2019/11/18 07:27:35.223 +00:00] [INFO] [printer.go:40] ["Welcome to TiDB."] ["Release Version"=v4.0.0-alpha-863-gb274eb207] ["Git Commit Hash"=b274eb2079dcbe9166825e677d4496046c494856] ["Git Branch"=master] ["UTC Build Time"="2019-11-15 02:17:55"] [GoVersion="go version go1.13 linux/amd64"] ["Race Enabled"=false] ["Check Table Before Drop"=false] ["TiKV Min Version"=v3.0.0-60965b006877ca7234adaced7890d7b029ed1306]
[2019/11/18 07:27:35.224 +00:00] [INFO] [printer.go:53] ["loaded config"] [config="{"host":"0.0.0.0","advertise-address":"","port":4000,"cors":"","store":"tikv","path":"pd0:2379,pd1:2379,pd2:2379","socket":"","lease":"0","run-ddl":true,"split-table":true,"token-limit":1000,"oom-use-tmp-storage":true,"oom-action":"log","mem-quota-query":34359738368,"enable-streaming":false,"enable-batch-dml":false,"txn-local-latches":{"enabled":false,"capacity":2048000},"lower-case-table-names":2,"log":{"level":"error","format":"text","disable-timestamp":false,"enable-timestamp":null,"disable-error-stack":null,"enable-error-stack":null,"file":{"filename":"/logs/tidb.log","max-size":300,"max-days":0,"max-backups":0},"slow-query-file":"","slow-threshold":300,"expensive-threshold":10000,"query-log-max-len":2048,"record-plan-in-slow-log":1},"security":{"skip-grant-table":false,"ssl-ca":"","ssl-cert":"","ssl-key":"","cluster-ssl-ca":"","cluster-ssl-cert":"","cluster-ssl-key":""},"status":{"status-host":"0.0.0.0","metrics-addr":"pushgateway:9091","status-port":10080,"metrics-interval":15,"report-status":true,"record-db-qps":false},"performance":{"max-procs":0,"max-memory":0,"stats-lease":"3s","stmt-count-limit":5000,"feedback-probability":0,"query-feedback-limit":1024,"pseudo-estimate-ratio":0.7,"force-priority":"NO_PRIORITY","bind-info-lease":"3s","txn-total-size-limit":104857600,"tcp-keep-alive":true,"cross-join":true,"run-auto-analyze":true},"prepared-plan-cache":{"enabled":false,"capacity":100,"memory-guard-ratio":0.1},"opentracing":{"enable":false,"rpc-metrics":false,"sampler":{"type":"const","param":1,"sampling-server-url":"","max-operations":0,"sampling-refresh-interval":0},"reporter":{"queue-size":0,"buffer-flush-interval":0,"log-spans":false,"local-agent-host-port":""}},"proxy-protocol":{"networks":"","header-timeout":5},"tikv-client":{"grpc-connection-count":16,"grpc-keepalive-time":10,"grpc-keepalive-timeout":3,"commit-timeout":"41s","max-batch-size":128,"overload-threshold":200,"max-batch-wait-time":0,"batch-wait-size":8,"enable-chunk-rpc":true,"region-cache-ttl":600,"store-limit":0},"binlog":{"enable":false,"ignore-error":false,"write-timeout":"15s","binlog-socket":"","strategy":"range"},"compatible-kill-query":false,"plugin":{"dir":"","load":""},"pessimistic-txn":{"enable":true,"max-retry-count":256},"check-mb4-value-in-utf8":true,"alter-primary-key":false,"treat-old-version-utf8-as-utf8mb4":true,"enable-table-lock":false,"delay-clean-table-lock":0,"split-region-max-num":1000,"stmt-summary":{"max-stmt-count":100,"max-sql-length":4096}}"]
[2019/11/18 07:52:11.635 +00:00] [ERROR] [misc.go:114] ["syntax error"] [error="line 1 column 9 near "SAVEPOINT `_4c374b78_16e7d3424ab__7ffc`" "]
[2019/11/18 07:53:17.502 +00:00] [ERROR] [misc.go:114] ["syntax error"] [error="line 1 column 9 near "SAVEPOINT `_4c374b78_16e7d3424ab__7ffb`" "]
[2019/11/18 07:59:43.205 +00:00] [ERROR] [misc.go:114] ["syntax error"] [error="line 1 column 9 near "SAVEPOINT `_4c374b78_16e7d3424ab__7ffa`" "]

麻烦提供一下建表语句

我已经开启了审计,tidb_general_log查到是1,但是容器内/logs/tidb.log中并没有看到有建表或者查询的sql语句 tidb.log里面内容很少: tidb.log (3.4 KB)

可以逐条执行生成的 sql,也可以制作成脚本,在 tidb 中执行,观察具体是哪里报错。

连接tidb并执行了很多条SQL,在容器内执行: tail -f tidb.log, tidb.log没有出现报错、也没有出现执行成功的SQL记录,感觉这个docker的审计功能完全没有开启,执行成功什么也看不到,执行错误就一个syntax error

我是指执行你用 mysql 导出的 sql,制作成脚本,在 tidb 中执行,观察具体是哪里报错。

问题找到了,是hive metastore的开启了事务机制,将hive-default.xml中hive.txn.manager选项的配置由org.apache.hadoop.hive.ql.lockmgr.DbTxnManager修改org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager,或者直接将该选项移除,就能够正常执行表的各种操作,应该是hive metastore的事务机制与TiDB有冲突导致的。谢谢你的热心回复,点赞:+1:

1赞

TiDB 存储 hive metastore 应该是没啥问题的,也能弹性扩展

嗯,之前遇到两个问题都解决了,目前测试下来还没有遇到新的问题,后面再进行各种case的兼容性和性能测试,要是全部OK的话,我们就准备将TiDB作为metastore的元数据库了。