TiDB 的问答社区

【TiDB 4.0 PCTA 学习笔记】- 1.3 A Brief History About the TiDB database platform（TiDB 发展简史）@2班+陈俊聪

☄️ 学习与认证经验&心得&笔记

fanderchan (Hacker X Si Icvue) 2020 年12 月 16 日 13:17 1

课程名称： 1.3 A Brief History About the TiDB database platform（TiDB 发展简史）

学习时长：

1 小时

课程收获：

了解TiDB 发展简史

课程内容：

Befor we begin

Goal: Introduce a brief history of TiDB
Outline:
- Ancient days of TiDB
- TiDB with TiSpark
- TiDB with TiFlash

Ancient days of TiDB

Inspired by Google Spanner,we made TiDB
In the 1.0.0 GA version, TiDB is
- A freely scalable (computing,storage) database
- Compatible with MySQL syntax and protocol
- Transparent Data Spliting Policy - Range Spliting
- Strongly consistent,distributed transaction support

TiDB Architecture - Original

In short: different sizes of the same model

Datahub Capability - Syncer

Datahub Capability - Coprocessor

Datahub Capability

TiDB is ideal for Datahub senarios
Protocol-compatible, easy synchronization of MySQL production libraries
Transparent and accessible cross-segmentation queries
Data landing in real time
Massive storage allows multiple data sources to converge
Standby - Datahub Analysis 2-in-1

One year later

TP Scenario
- CUSTOMER: There are still some problems though… Smells good!
AP Scenario
- Client 1: Complex statements are so slow!
- Client 2: Always OOM!
- Client 3: Can’t integrate with a big data platfrom!

Choice

Either combine TiDB or TiKV together
- Complete refactoring of optimizers and actuators to build MPP Engine
- High risk and long duration
OR,
- The need for an open source distributed computing framework
- High maturity and wide user base

TiSpark (2/3)

Spark helps us do distributed computing
- A mature distributed computing platform
- Faster(?),more stable(?).
Complete succession to the Apache Spark ecosystem
- Painlessly integrating into the big data ecosystem
- Scripting,Python,R,Apache Zeppelin,Hadoop…

TiSpark (3/3)

Apache Spark can only provide low concurrency computation
- Heavy computational model and high resource consumption
- Better for Reports and Heavyweight Adhoc Queries
Users stil need high concurrency,small to medium-sized AP capacity in many situations
- Complex query capability with low consumption
- TiDB is far simpler to maintain than Spart clusters.

Meanwhile…

We were also working on various optimizations around stand-alone TiDB
- Smarter,more efficient and faster in small to medium scale scenarios
Optimizer
- Basic optimizer? → RBO + CBO Optimizer → Cascades Optimizer (WIP)
Executor
- Classic Volcano Model → Batch Execution → Vectorized Execution
- Better Concurrency and Pipeline
Partition tables,Index Merge,etc.

Core conflict

At this point,we were still left with 2 core contradictions.
- Row storage is not friendly to analysis scenarios
  - “How dare you call yourelves HTAP without column store?”
- Workload isolation is not possible
  - “I ran a query and the CPU usage was 1000%”
  - TiSpark scenarios would be worse.

TiFlash

Synchronize a set of column storage independently via Raft Learner
- Raft Learner provides extremely low consumption copy synchronization
- Raft Learner read protocol works with MVCC to provide strong and consistent reads
Physical isolation via Label
- AP / TP workloads do not affect each other

Till now

TiDB = ~~X% TP + Y% AP~~ = HTAP
- TiDB doesn’t require you to choose TP or AP，it’s HTAP.
One Platform,compatible with row and column storage
- Painless data synchronization
Easy to analyze on columns when the main TiDB cluster runs TP services

TiDB Today

学习过程中参考的其他资料

文档名称
文档名称
文档名称

©2023 TiDB Community. 京ICP备20022552号-5 京公网安备11010802043344号