tidb hash join优化版是什么意思

有猫万事足 · 2025 年1 月 18 日 11:21

pingcap/tidb/blob/master/docs/design/2024-05-11-hash-join-v2.md

# Proposal: Hash Join V2

- Author(s): [windtalker](https://github.com/windtalker)
- Tracking Issue: https://github.com/pingcap/tidb/issues/53127

## Introduction 
Hash join is a widely used join algorithm. TiDB supports hash join since its 1.0 version, however, the current implementation 
of hash join has some shortcomings:
* At build stage, it does not support concurrent build, which may lead some performance issues when the build side is large. 
* At probe stage, the interface is not well-designed, which may cause redundant calculations in some cases: https://github.com/pingcap/tidb/issues/47424
* In current implementation, there are some concepts that are actually not found in other database's implementation. For example, beside left side/right side and build side/right side, there is inner side/outer side , and `useOuterToBuild`, `outerIsRight`, `tryToMatchInners`, `tryToMatchOuters` are introduced to handle the extra complex. This makes the current code too complex to understand and more error-prone when we try to fix bug.

Taking into account the above factors, we decided to do a complete refactoring of hash join.

## Problems

The basic idea of hash join is to divide the join into build stage and probe stage. On the build stage, a hash table is built 
based on the join key, on the probe stage, a lookup is made in the hash table using the join key, and the join result is 
generated based on the lookup result and the join type. The problems faced in the build stage mainly include the design of 
the hash table, the data organization on the build side and the concurrent build of the hash table. The problems faced in

This file has been truncated. show original

设计文档在这里