Description:
In the from clause mentioned below a virtual table is used.
It is executed first and then the other statements are joined.
V_adres_kvk_b and v_adres_ms_b are views, underlying table is table adres which is 1 Gigabyte in size.
The intermediate resultset is 2G big.
The optimizer doesn't seem to know that only the two columsn mentioned in the select are needed in the intermediate resultset!! It could be much smaller.
What happens in our production environment is that the statement cannot be executed succesfully despite a /tmp of 5G.
...
join (
select bkwi_id, bkwi_volgnr
from sbrquery.v_adres_kvk_b
where code_gemeente in ('0390','1320','0517', ... ,'1245')
UNION -- distinct
select bkwi_id, bkwi_volgnr
from sbrquery.v_adres_ms_b
where code_gemeente in ('0390','1320','0517', ... ,'1245')
) uu on (uu.bkwi_id = ee.bkwi_id and uu.bkwi_volgnr = ee.bkwi_volgnr)
...
How to repeat:
For the complete statement see attachement.
Suggested fix:
Select in the intermediate resultset only the columns needed.
Also when a virtual table and views are used.
Remark:
- executing virtual table as a separate statement doesn't seem to have this problem
select bkwi_id, bkwi_volgnr
from sbrquery.v_adres_kvk_b
where code_gemeente in ('0390','1320','0517', ... ,'1245')
UNION -- distinct
select bkwi_id, bkwi_volgnr
from sbrquery.v_adres_ms_b
where code_gemeente in ('0390','1320','0517', ... ,'1245')
If you need more info please inform me.
Cannot supply the data (too much).
If you solve this MySQL is one step further on the road of handling large amounts of data succesfully.