# Proposal: Hash Join V2
- Author(s): [windtalker](https://github.com/windtalker)
- Tracking Issue: https://github.com/pingcap/tidb/issues/53127
## Introduction
Hash join is a widely used join algorithm. TiDB supports hash join since its 1.0 version, however, the current implementation
of hash join has some shortcomings:
* At build stage, it does not support concurrent build, which may lead some performance issues when the build side is large.
* At probe stage, the interface is not well-designed, which may cause redundant calculations in some cases: https://github.com/pingcap/tidb/issues/47424
* In current implementation, there are some concepts that are actually not found in other database's implementation. For example, beside left side/right side and build side/right side, there is inner side/outer side , and `useOuterToBuild`, `outerIsRight`, `tryToMatchInners`, `tryToMatchOuters` are introduced to handle the extra complex. This makes the current code too complex to understand and more error-prone when we try to fix bug.
Taking into account the above factors, we decided to do a complete refactoring of hash join.
## Problems
The basic idea of hash join is to divide the join into build stage and probe stage. On the build stage, a hash table is built
based on the join key, on the probe stage, a lookup is made in the hash table using the join key, and the join result is
generated based on the lookup result and the join type. The problems faced in the build stage mainly include the design of
the hash table, the data organization on the build side and the concurrent build of the hash table. The problems faced in
This file has been truncated. show original