Description:
In the function WalkTablesUnderAccessPath(), the policy WalkAccessPathPolicy::STOP_AT_MATERIALIZATION is used. However, there is no judgment or handling of this policy in the basic method WalkAccessPaths(). Since all specific logical controls are determined by the passed-in lambda function, if it returns true, the current path search will stop. Therefore, the lambda function needs to describe and explain the logic of STOP_AT_MATERIALIZATION.
However, both the WalkAccessPaths() function and the lambda function do not judge the STOP_AT_MATERIALIZATION policy, making this policy seem like a false concept. From the calling hierarchy, the parameter STOP_AT_MATERIALIZATION passed to the WalkAccessPaths() function should be identified and processed by this function.
This confusing logic will have unexpected effects, such as GetUsedTables searching tables under the materialized table. This directly affects the build input tables and probe input tables of the hash join, and the tables under the materialized table are added to the tables. Although the repeated tables are added, it will not affect the correctness due to the influence of the table bitmap. However, it is worth mentioning that there are several places in the hash join where the table list traversal is processed, and the table objects that are repeatedly added to the list will also execute some instructions repeatedly.
Here is a specific example:
```
create table probe_table(
id int primary key,
col int);
insert into probe_table values(1,1);
create table build_table(
id int primary key,
col int);
insert into build_table values(1,1);
explain format=tree select
subq_0.c0 as c0
from
(
select
ref_1.col as c0
from
probe_table as ref_1
where
ref_1.id is NOT NULL
group by
1
) as subq_0
left join build_table as ref_0 on (true) order by 1;
```
The plan is:
```
| -> Sort: subq_0.c0
-> Stream results (cost=0.53 rows=1)
-> Left hash join (no condition) (cost=0.53 rows=1)
-> Table scan on subq_0 (cost=2.51..2.51 rows=1)
-> Materialize (cost=5.58..5.58 rows=1)
-> Table scan on <temporary> (cost=2.51..2.51 rows=1)
-> Temporary table with deduplication (cost=2.96..2.96 rows=1)
-> Table scan on ref_1 (cost=0.35 rows=1)
-> Hash
-> Index scan on ref_0 using PRIMARY (cost=0.18 rows=1)
```
GDB:
```
(gdb) b HashJoinIterator::ReadRowFromProbeIterator
Breakpoint 1 at 0x16fec77: file sql/iterators/hash_join_iterator.cc
(gdb)
Continuing.
Breakpoint 2, HashJoinIterator::ReadRowFromProbeIterator (this=0x7f7dfbc953a0)
sql/iterators/hash_join_iterator.cc
(gdb) p m_probe_input_tables.m_tables.m_buff[0].table
$1 = (TABLE *) 0x7f88a95b9500
(gdb) p m_probe_input_tables.m_tables.m_buff[1].table
$2 = (TABLE *) 0x7f88a95b9500
```
In this scenario, two identical table pointers are added to m_probe_input_tables, causing handler->position(table->record[0]) to be executed twice in RequestRowId(). At this time, each record on the Probe side will be executed one more time, which may have some performance impact. In the implementation of HashJoin, the repeated TABLE pointers may introduce redundant call execution in the INIT, BUILD, and PROBE stages.
Therefore, there are two issues that need to be discussed:
1. In the implementation of HashJoin, repeated TABLE pointers may have performance impact.
2. All code logic that uses STOP_AT_MATERIALIZATION may need to be evaluated to determine whether it really needs to stop at MATERIALIZATION.
How to repeat:
As above.