Bug #88300 subselect across two schemas returns incorrect data with very small dataset
Submitted: 30 Oct 2017 23:14 Modified: 7 Nov 2017 11:58
Reporter: Mike McPherson Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Information schema Severity:S3 (Non-critical)
Version:MySQL 5.7.18 OS:Linux
Assigned to: CPU Architecture:Any
Tags: subselect two schemas very small dataset

[30 Oct 2017 23:14] Mike McPherson
Description:
I have a Data Dictionary style table in my schema
I'm trying to compare its opinion about a table with what the database knows
If I have 3 or fewer rows in my Data Dictionary table, the query returns the rows as if they are not in the table
If I increase the row count to 4 or more, it works fine and returns no rows

How to repeat:
my Data Dictionary table has two varchar columns ColumnTame and ColumnType
it has two entries for TableName='Zones'
TableName='Zone' ColumnName='Zone'
TableName='Zone' ColumnName='ZoneName'
the Zones table has two columns (that match my Data Dictionary)

it has four entries for TableName='Users'
TableName='Users' ColumnName='UserName'
TableName='Users' ColumnName='StartDate'
TableName='Users' ColumnName='EndDate'
TableName='Users' ColumnName='GroupName'
the Users table has four columns (that match my Data Dictionary)

SQL:
SELECT ctDD.ColumnName, ctDD.ColumnType
FROM ctnova_com_celltracker.ctDataDictionary ctDD 
WHERE ctDD.TableName='Zones' 
AND CONCAT(ctDD.TableName, '.', ctDD.ColumnName) NOT IN (SELECT CONCAT(Table_Name, '.', Column_Name) FROM INFORMATION_SCHEMA.COLUMNS WHERE Table_Name='Zones') 
ORDER BY ColumnName

returns the two column names in my Data Dictionary

the same query, using TableName 'Users'

SQL:
SELECT ctDD.ColumnName, ctDD.ColumnType
FROM ctnova_com_celltracker.ctDataDictionary ctDD 
WHERE ctDD.TableName='Users' 
AND CONCAT(ctDD.TableName, '.', ctDD.ColumnName) NOT IN (SELECT CONCAT(Table_Name, '.', Column_Name) FROM INFORMATION_SCHEMA.COLUMNS WHERE Table_Name='Users') 
ORDER BY ColumnName

returns no rows

if I then add two columns to the Zones table (and matching rows to Data Dictionary) it works correctly

Suggested fix:
review the code around NOT IN
[30 Oct 2017 23:40] MySQL Verification Team
Thank you for the bug report. Please provide the test case with create table and insert statements and print the queries result using the mysql client command. Thanks in advance.
[31 Oct 2017 8:11] Mike McPherson
SQL to create tables and data

Attachment: MySQLbug.sql (application/octet-stream, text), 1.31 KiB.

[31 Oct 2017 8:14] Mike McPherson
erroneous results output

Attachment: MySQLBugOutput.pdf (application/pdf, text), 116.68 KiB.

[31 Oct 2017 8:15] Mike McPherson
SQL to generate the tables and data attached
you will need to change references to my schema (but you knew that!)
also the erroneous output from the smaller query
thanks
Mike
[31 Oct 2017 17:01] MySQL Verification Team
Hi!

This could be a simple problem with the blanks. Have you tried to change the first query into:

SELECT ctDD.ColumnName, ctDD.ColumnType
FROM ctnova_com_celltracker.ctDataDictionary ctDD 
WHERE ctDD.TableName='Zones' 
AND (ctDD.TableName NOT IN (SELECT Table_Name FROM INFORMATION_SCHEMA.COLUMNS WHERE Table_Name = 'Zones))
AND (ctDD.ColumnName NOT IN (SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE Table_Name = 'Zones'))
ORDER BY ColumnName

Does it work better with a query like this ??
[31 Oct 2017 19:07] Mike McPherson
hmm
i tried that query and it returned no rows
but my question is why does the problem go away if I add another column to the Zones table and my Data Dictionary

also, if I remove a column from the (working) Users table and Data Dictionary, the problem re-appears???

thanks

PS is there a better way to compare two columns?
[31 Oct 2017 19:50] Mike McPherson
further update
I did more tests
it seems your proposed query is not picking up errors
that is to say where there are more columns in the table than in the Data Dictionary
OR
where there are more columns in the Data Dictionary than there are in the table

so - sorry - doesn't work :)
[1 Nov 2017 12:52] MySQL Verification Team
Hi!

Seems that what you need is a join, rather then nested query. Probably even outer join to get what you want.
[1 Nov 2017 16:04] Mike McPherson
there are two points here
a) how do I get the data that I need?
I can see a way forward with that
b) there is still a bug if the returned dataset is < 4 rows

I wait to hear from you :)
[1 Nov 2017 16:31] MySQL Verification Team
Hi!

You wrote:

"if I then add two columns to the Zones table (and matching rows to Data Dictionary) it works correctly"

Can you provide exactly the columns that you added and the matching rows ...
[3 Nov 2017 8:42] Mike McPherson
apologies for the delay
I've been off sick for a couple of days

add two rows to Data Dictionary table
INSERT INTO ctDD (TableName, ColumnName) VALUES ('Zones', 'Size');
INSERT INTO ctDD (TableName, ColumnName) VALUES ('Zones', 'Capital');

add two columns to Zones table
ALTER TABLE Zones ADD COLUMN Size INTEGER;
ALTER TABLE Zones ADD COLUMN Capital VARCHAR(25);

now run the query
SELECT ctDD.ColumnName, ctDD.ColumnType
FROM ctnova_com_celltracker.ctDD 
WHERE ctDD.TableName='Zones' 
AND CONCAT(ctDD.TableName, '.', ctDD.ColumnName) NOT IN (SELECT CONCAT(Table_Name, '.', Column_Name) FROM INFORMATION_SCHEMA.COLUMNS WHERE Table_Name='Zones') 
ORDER BY ColumnName;

and there are no rows reported (which is correct)
[3 Nov 2017 16:58] MySQL Verification Team
Hi!

LEFT JOIN queries work just fine, but nested query does not work ...... I also changed my collation to match the one from I_S.

Results:

select ctDD.TableName, information_schema.columns.Table_name, ctDD.ColumnName, information_schema.columns.Column_name from ctDD LEFT JOIN information_schema.columns ON ctDD.TableName = information_schema.columns.Table_Name AND ctDD.ColumnName = information_schema.columns.Column_Name  WHERE information_schema.columns.Column_Name='Zones';

[empty]

select ctDD.TableName, information_schema.columns.Table_name, ctDD.ColumnName, information_schema.columns.Column_name from ctDD LEFT JOIN information_schema.columns ON ctDD.TableName = information_schema.columns.Table_Name AND ctDD.ColumnName = information_schema.columns.Column_Name  WHERE ctDD.ColumnName='Zones'";

[empty]

select ctDD.ColumnName from ctDD WHERE ctDD.TableName='Zones' AND ((ctDD.TableName, ctDD.ColumnName) NOT IN (SELECT Table_Name, Column_Name FROM information_schema.columns WHERE Table_Name  = 'zones'));

+------------+
| ColumnName |
+------------+
| Zone       |
| ZoneName   |
+------------+

select ctDD.ColumnName from ctDD WHERE ctDD.TableName='Zones' AND ((ctDD.TableName, ctDD.ColumnName) NOT IN (SELECT Table_Name, Column_Name FROM information_schema.columns WHERE TableName  = 'zones'));

+------------+
| ColumnName |
+------------+
| Zone       |
| ZoneName   |
+------------+

Verified. Could be I_S problem, but also nested query .... I think it is I_S .....
[7 Nov 2017 11:58] Mike McPherson
ok so you have verified that there is a bug
is there a plan to fix it?
thank
[7 Nov 2017 12:32] MySQL Verification Team
Yes, there will be plans on fixing it. However, the entire complex is strictly internal and quite complex. When the bug is fixed, this page will be updated.