如何获得 TiDB 中事务的开始时间戳与提交时间戳?

问题:

如何获得 TiDB 中事务的开始时间戳与提交时间戳?

我们的方案:
在 TiCDC 中添加了 log 语句,然后将 TiDB 作为上游,MySQL 作为下游,TiCDC 中的日志保存了事务的开始时间戳与提交时间戳。

我们的疑问
1、上面获取事务的开始时间戳与提交时间戳的方法是否正确?

2、如何获取只读事务的开始时间戳与提交时间戳?

目前我们无法获取只读事务的开始时间戳与提交时间戳。根据 TiCDC 文档,只有发生数据变更时,上下游才会进行同步。

更详细的英文描述如下:


Our Problem

How to obtain the start-timestamp and commit-timestamp of transactions in TiDB?

Our Solution

Environment

  • DB version: TiDB v7.3
  • Driver: go-sql-driver/mysql

To obtain the start and commit timestamps of transactions

We run the optimistic transaction model of TiDB.
We use TiCDC to connect TiDB as the upstream and MySQL as the downstream. We modify TiCDC slightly to log the start and commit timestamps into a file.

Modification code: lines 214-220 of cdc/sinkv2/eventsink/txn/mysql/mysql.go

func convert2RowChanges(
  ...
) ... {
  log.Info("CDC:RowChangeEvent",
  	zap.Int64("row-id", row.RowID),
  	zap.String("table-name", row.Table.Table),
  	zap.Uint64("start-ts", row.StartTs),
  	zap.Uint64("commit-ts", row.CommitTs),
  	zap.Any("columns", row.Columns),
  	zap.Any("pre-columns", row.PreColumns))
...

The log is as follows from which it is easy to extract the timestamps start-ts and commit-ts:

[2023/10/07 21:07:22.983 +08:00] [INFO] [mysql.go:214] [CDC:RowChangeEvent] [row-id=4] [table-name=txn0] [start-ts=444775541403025430] [commit-ts=444775541494775834] [columns="[{\"name\":\"k\",\"type\":3,\"charset\":\"binary\",\"flag\":27,\"value\":4,\"default\":null},{\"name\":\"v\",\"type\":3,\"charset\":\"binary\",\"flag\":1,\"value\":3607,\"default\":null}]"] [pre-columns=null]
[2023/10/07 21:07:22.983 +08:00] [INFO] [mysql.go:214] [CDC:RowChangeEvent] [row-id=7] [table-name=txn0] [start-ts=444775541403025430] [commit-ts=444775541494775834] [columns="[{\"name\":\"k\",\"type\":3,\"charset\":\"binary\",\"flag\":27,\"value\":7,\"default\":null},{\"name\":\"v\",\"type\":3,\"charset\":\"binary\",\"flag\":1,\"value\":3605,\"default\":null}]"] [pre-columns=null]
[2023/10/07 21:07:22.983 +08:00] [INFO] [mysql.go:214] [CDC:RowChangeEvent] [row-id=11] [table-name=txn0] [start-ts=444775541403025430] [commit-ts=444775541494775834] [columns="[{\"name\":\"k\",\"type\":3,\"charset\":\"binary\",\"flag\":27,\"value\":11,\"default\":null},{\"name\":\"v\",\"type\":3,\"charset\":\"binary\",\"flag\":1,\"value\":3629,\"default\":null}]"] [pre-columns=null]
[2023/10/07 21:07:22.983 +08:00] [INFO] [mysql.go:214] [CDC:RowChangeEvent] [row-id=12] [table-name=txn0] [start-ts=444775541403025430] [commit-ts=444775541494775834] [columns="[{\"name\":\"k\",\"type\":3,\"charset\":\"binary\",\"flag\":27,\"value\":12,\"default\":null},{\"name\":\"v\",\"type\":3,\"charset\":\"binary\",\"flag\":1,\"value\":3626,\"default\":null}]"] [pre-columns=null]
[2023/10/07 21:07:22.983 +08:00] [INFO] [mysql.go:214] [CDC:RowChangeEvent] [row-id=21] [table-name=txn0] [start-ts=444775541403025430] [commit-ts=444775541494775834] [columns="[{\"name\":\"k\",\"type\":3,\"charset\":\"binary\",\"flag\":27,\"value\":21,\"default\":null},{\"name\":\"v\",\"type\":3,\"charset\":\"binary\",\"flag\":1,\"value\":3556,\"default\":null}]"] [pre-columns=null]
[2023/10/07 21:07:22.983 +08:00] [INFO] [mysql.go:214] [CDC:RowChangeEvent] [row-id=22] [table-name=txn0] [start-ts=444775541403025430] [commit-ts=444775541494775834] [columns="[{\"name\":\"k\",\"type\":3,\"charset\":\"binary\",\"flag\":27,\"value\":22,\"default\":null},{\"name\":\"v\",\"type\":3,\"charset\":\"binary\",\"flag\":1,\"value\":3645,\"default\":null}]"] [pre-columns=null]
[2023/10/07 21:07:22.983 +08:00] [INFO] [mysql.go:214] [CDC:RowChangeEvent] [row-id=23] [table-name=txn0] [start-ts=444775541403025430] [commit-ts=444775541494775834] [columns="[{\"name\":\"k\",\"type\":3,\"charset\":\"binary\",\"flag\":27,\"value\":23,\"default\":null},{\"name\":\"v\",\"type\":3,\"charset\":\"binary\",\"flag\":1,\"value\":3656,\"default\":null}]"] [pre-columns=null]
[2023/10/07 21:07:22.983 +08:00] [INFO] [mysql.go:214] [CDC:RowChangeEvent] [row-id=31] [table-name=txn0] [start-ts=444775541403025430] [commit-ts=444775541494775834] [columns="[{\"name\":\"k\",\"type\":3,\"charset\":\"binary\",\"flag\":27,\"value\":31,\"default\":null},{\"name\":\"v\",\"type\":3,\"charset\":\"binary\",\"flag\":1,\"value\":3586,\"default\":null}]"] [pre-columns=null]

Our Questions

  • Have we obtain the start and commit timestamps of TiDB successfully?

  • What about read-only transactions?
    Currently we are not able to obtain the start and commit timestamps of read-only transactoins.
    According to the docs on TiCDC (in Chinese),
    it seems that the synchronization between the upstream and the downstream
    is performed only when data is changed.

Thanks.


Motivation

We are working on a lightweight white-box checking algorithm of snapshot isolation (SI): given an execution of a database, to check whether it satisfies SI.

The SI checking problem is NP-hard for general executions.
So it is desirable to make use of the knowledge of how SI is actually implemented in databases.
The insight is that most databases, especially distributed databases, implement SI following the generic protocol using start-timestamps and commit-timestamps.
With these timestamps of transactions in an execution, the SI checking problem becomes solvable in polynomial time. Therefore, we want to obtain these timestamps when generating executions.

It is crucial for us to really understand the meaning and roles of the start-timestamps and commit-timestamps in the database under testing.
We must be very sure that we have obtained the right timestamps in the right way.

That is why we ask for help here.

Background

We are digging into the implementation of snapshot isolation of TiDB, especially into the use of timestamps in transactions.

Consider the classic description of start-timestamp and commit-timestamp in implementing Snapshot Isolation,
quoted from the paper; Section 4.2:

For start-timestamp: A transaction executing with Snapshot Isolation always reads data from a snapshot of the (committed) data as of the time the transaction started, called its Start-Timestamp. This time may be any time before the transaction’s first Read.

For commit-timestamp: When the transaction T1 is ready to commit, it gets a
Commit-Timestamp, which is larger than any existing Start-Timestamp or Commit-Timestamp.
When T1 commits, its changes become visible to all
transactions whose Start-Timestamps are larger than T1’s Commit-Timestamp.

For conflict detection:
The transaction T1 successfully commits only if no other transaction T2 with a
Commit-Timestamp in T1’s execution interval [Start-Timestamp, Commit-Timestamp] wrote data that T1 also wrote. Otherwise, T1 will abort.
This feature, called First-committer-wins prevents lost updates.

1 个赞

方便用中文描述一下你的问题吗?

他的问题是:如何获得TIDB中事务的开始时间与事务的结束时间?
他的方案是:改了下TiCDC,然后将TIDB作为上游,Mysql作为下游,这样TiCDC中的日志就存了事务的开始时间与结束时间。
他的疑问是:1、这样获取事务的开始时间与事务的结束时间对吗?
2、对于只读事务该怎么获取?
他们没法获取只读事务的开始时间与结束时间。根据文档,只有数据变更时,上下游才会同步。

1 个赞

是的,谢谢。因为考虑到后续与工作的合作者交流,使用了英文。我添加一下中文描述。

1 个赞
  1. 从 ticdc 中获取的应该是准确的。
  2. 只读事务,只有开始时间戳,没有提交时间戳
  3. 好像是没有查询接口直接获取事务时间戳,只读事务的更不可能。但是有个骚操作看着有可行性,就是 tidb 的慢日志会打印出来 Txn_start_ts,这个应该就是你要的事务开始时间戳。

那就想办法把慢日志打印出来,比如开启慢日志后,把设置慢日志的阈值足够小,让其输出慢日志,或者就改下源码,让它不管什么情况都打印出来,能改 ticdc,稍微改下 tidb 的应该也不难。

2 个赞

可以看看 tidb_last_txn_info https://docs.pingcap.com/zh/tidb/stable/system-variables#tidb_last_txn_info-从-v409-版本开始引入tidb_current_ts https://docs.pingcap.com/zh/tidb/stable/system-variables#tidb_current_ts

2 个赞

好的 感谢您的回复,我们会去实验一下 :heartpulse:

您好!感谢您的回复,关于tidb_current_ts应该怎么定义其取值,这个变量会输出在tidb.log嘛? :heartpulse:

TiDB 中事务的开始时间戳与提交时间戳,这两个时间是tidb-server从tipd-server处申请得到的,tipd 是整个分布式数据库的授时系统。在数据写入储存层的过程中通过2PC机制把数据和时间信息都持久化了。

TiCDC是一个实时数据同步工具,用于捕获储存层的实时数据变更,将KV键值对变更值转为SQL或者其他协议的数据,再同步到下游去。由于有MVCC机制保证,所以每个键值对自身都带有时间戳信息,也就是每个TiCDC同步变更的数据都自带有事务开始时间戳和提交时间戳。

只读事务虽然有开始时间戳(用于判断读取具体版本的数据),由于没有变更行为,它相关的数据不会进入TiCDC。

2 个赞

有帮助

我试了一下,把满日志的阈值改小就能打印出来了,tidb_slow_log_threshold 单位是 ms,设为1,那么所有的 sql 都能打印出来。

上面 wd0517 提到的 tidb_current_ts 也很好用,只要显式的开启事务就行:

···
begin;
show variables like “tidb_current_ts”;
exec other_sql;
···

TIDB 里,同一个事务快照读,都是用的同一个开始时间戳。

2 个赞

好嘟! 非常感谢!! :heart_eyes:

嗯嗯 所以只读事务无法通过ticdc获取到时间戳了 我们在尝试用其他办法获取 目前的方法有1、调整慢日志阈值,理论上只有阈值够小,慢日志可以打印出所有事物的时间戳信息 2、使用tidb_current_ts ,显示开启事务

此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。