| Bug #40196 | MS Access ODBC join (three tables) returns incomplete record set | ||
|---|---|---|---|
| Submitted: | 21 Oct 2008 9:21 | Modified: | 29 Apr 2013 5:25 |
| Reporter: | Graham Wideman | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | Connector / ODBC | Severity: | S2 (Serious) |
| Version: | 5.01.05 | OS: | Windows (XP up-to-date) |
| Assigned to: | Lawrenty Novitsky | CPU Architecture: | Any |
| Tags: | join, missing rows, MS Access, ODBC | ||
[21 Oct 2008 9:21]
Graham Wideman
[21 Oct 2008 9:23]
Graham Wideman
MySQL dump script and MS Access MDB file
Attachment: wideman_odbc_join_bug.zip (application/x-zip-compressed, text), 19.32 KiB.
[21 Oct 2008 9:30]
Graham Wideman
Forgot to note: msjet40.dll 4.0.9511.0
[21 Oct 2008 21:28]
Graham Wideman
ODBC log for bad and good cases
Attachment: wideman_join_bug_ODBC.LOG (application/octet-stream, text), 174.08 KiB.
[21 Oct 2008 21:32]
Graham Wideman
I've uploaded the ODBC log captured while first running the query that produces bad result, and then again to produce good result (employing the irrelevant WHERE clause). I've manually edited in a comment "=== good query ===" and "=== bad query ===" to show where each started. Clearly the two queries follow different strategies, retrieving data in different orders. Notably the bad one includes component queries that use an ORDER BY clause, where the good one does not. Hope that helps, Graham
[27 Oct 2008 2:22]
Graham Wideman
Further investigation:
I am now fairly certain that issue is that where the ODBC log shows the connector using an intermediate query with an ORDER BY clause, whatever code follows that is making a wrong assumption about the order of rows returned.
I get to that conclusion by noting that when I change values in ta.AId to use characters that are "above" the alphabetics in the collation order, this results in the missing rows, but sticking to characters in same order as cp1252 does not. (Tested quite a bit but not exhaustively). All the MySQL collation orders have characters [\]{}_ and more above the alphas and numerics, these are in very different order from Windows code page 1252, which has them all below numerics and alphas.
Bottom line -- presumably the intermediate query employs an ORDER BY clause because subsequent code wants to employ the ordered results for further comparisons -- except those comparisons are likely the cause of missing rows if carried out under a different (locally cp1252 perhaps) collation.
-- Graham
[27 Oct 2008 2:23]
Graham Wideman
Further investigation:
I am now fairly certain that issue is that where the ODBC log shows the connector using an intermediate query with an ORDER BY clause, whatever code follows that is making a wrong assumption about the order of rows returned.
I get to that conclusion by noting that when I change values in ta.AId to use characters that are "above" the alphabetics in the collation order, this results in the missing rows, but sticking to characters in same order as cp1252 does not. (Tested quite a bit but not exhaustively). All the MySQL collation orders have characters [\]{}_ and more above the alphas and numerics, these are in very different order from Windows code page 1252, which has them all below numerics and alphas.
Bottom line -- presumably the intermediate query employs an ORDER BY clause because subsequent code wants to employ the ordered results for further comparisons -- except those comparisons are likely the cause
[27 Oct 2008 15:33]
Graham Wideman
... arg, cablemodem problem cause partial comment post last time. Completing the last comment, my conclusion is that when the MySQL ODBC connector (5.1.5, Windows XP) breaks a three-table join into intermediate queries, it: a) sometimes uses an ORDER BY clause, presumably because it's going to take advantage of the sorted results in subsequent steps, but... b) the sort order returns results in an order that doesn't match the collation expectations of the subsequent steps, and consequently... c) the subsequent steps fail to find all the possible joins. -- Graham
[27 Oct 2008 15:35]
Jess Balint
Graham, Can you provide a small sample of the data which is causing the problem? eg, the rows which are not joined but should be
[27 Oct 2008 15:39]
Tonci Grgin
Graham, besides what Jess asked, I think it's imperative that you take those queries and try running them in command line client (mysql). See if you can repeat the problem there too.
[27 Oct 2008 16:01]
Graham Wideman
Hi Jesse and Tonci: Jesse: I already uploaded sample data (2008-10-21) -- see message above, with attachment wideman_odbc_join_bug.zip It's three tables, ta, tb and tc, having 2 rows, 8 rows and 2 rows respectively. In the join case, all 8 rows should appear. Tonci: as for running in MySQL directly, yes, that works fine (all 8 rows returned). Also in the attachment is an MDB with the "good" and "bad" queries, which are as reported in my original bug-report message. (You probably know this, but in the MDB you'll need to recreate links ta, tb and tc, or use Linked Table Manager to fix them). Also on 2008-10-21 I uploaded the ODBC log for bad and good cases: wideman_join_bug_ODBC.LOG. I hope that covers what you were looking for? -- Graham
[28 Oct 2008 8:54]
Graham Wideman
More symptoms: I have noticed more to the pattern of failure.
First, I don't understand exactly what I'm looking at in the ODBC log file -- certainly there's a sequence of steps, but it's not clear to me what entries in the log file are "here's what Access asked for" versus "here's what ODBC decided to do". Hence the log file doesn't tell me where the mistake is happening. Hopefully you all are smarter.
That said -- here's an additional pattern I've noticed.
Where I set up the "bad" query in Access thusly:
SELECT ta.AId, tb.BId, tc.CId
FROM (ta LEFT JOIN tb ON ta.AId = tb.AId) LEFT JOIN tc ON tb.CId = tc.CId;
.. what appears in the ODBC log file is this:
wideman_odbc_jo fc4-96c EXIT SQLExecDirectW with return code 0 (SQL_SUCCESS)
"SELECT `tb`.`AId` ,`tb`.`BId`,`tc`.`CId` FROM {oj `tb` LEFT OUTER JOIN `tc` ON (`tb`.`CId` = `tc`.`CId` ) } ORDER BY `tb`.`AId` \ 0"
... note the ORDER BY clause that's been added. Further down in the log file there's a smaller query:
"SELECT `AId` ,`ta`.`AId` FROM `ta` ORDER BY `ta`.`AId` \ 0"
... which also has an ORDER BY clause.
Contrastingly, in the "good" query:
SELECT ta.AId, tb.BId, tc.CId
FROM (ta LEFT JOIN tb ON ta.AId = tb.AId) LEFT JOIN tc ON tb.CId = tc.CId
WHERE (((ta.AId)<>"x"));
... this results in a log showing:
"SELECT `ta`.`AId`,`tb`.`CId` ,`tb`.`BId` FROM {oj `ta` LEFT OUTER JOIN `tb` ON (`ta`.`AId` = `tb`.`AId` ) } WHERE NOT((`ta`.`AId` = 'x' ) ) \ 0"
... note no ORDER BY... and later other smaller queries but no ORDER BYs.
But, if I then *add* an ORDER BY tA.AId clause to the "good" query, I then get the bad behavior (and the log of course shows ORDER BY clauses). Ie: the WHERE clause was somehow prompting Access or ODBC to not gratuitously add an ORDER BY, but when ORDER BY is specifically included the failure occurs.
Finally, if I change the query in Access to INNER join:
SELECT ta.AId, tb.BId, tc.CId
FROM (ta INNER JOIN tb ON ta.AId = tb.AId) INNER JOIN tc ON tb.CId = tc.CId
ORDER BY ta.AId;
... even with ORDER BY this succeeds. In the log file:
"SELECT `ta`.`AId`,`tb`.`BId`,`tc`.`CId` FROM `tb`,`ta`,`tc`
WHERE ((`ta`.`AId` = `tb`.`AId` ) AND (`tb`.`CId` = `tc`.`CId` ) )
ORDER BY `ta`.`AId` \ 0"
... and though there are subsequent simpler queries, none of them have ORDER BY.
So... again this suggests that ORDER BY in the "sub queries" is a problem resulting in overall JOIN failure.
Hope that helps,
Graham
[28 Oct 2008 9:02]
Graham Wideman
Question: On the theory that the bug we're discusion relates to collation issues, I am looking for a MySQL collation that matches Windows CP1252 US English: http://www.collation-charts.org/winxp/winxp.0409.CP1252.English_United_States.html Despite this: http://dev.mysql.com/doc/refman/5.0/en/charset-we-sets.html ... saying that latin1 matches CP1252 *character set*, there doesn't appear to be a MySQL *collation* that matches CP1252 US English -- am I right? -- Graham
[13 Feb 2009 15:04]
Tonci Grgin
Quote:
.. what appears in the ODBC log file is this:
wideman_odbc_jo fc4-96c EXIT SQLExecDirectW with return code 0 (SQL_SUCCESS)
"SELECT `tb`.`AId` ,`tb`.`BId`,`tc`.`CId` FROM {oj `tb` LEFT OUTER JOIN `tc` ON
(`tb`.`CId` = `tc`.`CId` ) } ORDER BY `tb`.`AId` \ 0"
EndQuote.
This is added by Access itself, not MyODBC... Analyzing again.
[13 Feb 2009 15:20]
Susanne Ebrecht
> ... saying that latin1 matches CP1252 *character set*, there doesn't appear to be a MySQL > *collation* that matches CP1252 US English -- am I right? Graham, you are wrong here. There is nothing special in US English. US don't treat MAC and MC as the same like it is in some UK sorting rules. Also English don't has letters with double dots or circle or other curlicues. US sorting rule is just like the latin alphabet; a, b, c, ..... x, y, z. You can use every latin1 collation here. Latin1 is the alphabet plus some additional letters or alphabet letters with curlicues which are used in Western European languages. All Western European languages use the latin alphabet and of course the order of the letters is the same in all languages. The difference is just where to sort curlicues or special letters like ß or å or ö or ç or whatever. English don't have sees letters so it won't show up in sorting anyway. For English you have free choice which collation you are using. When you use latin1 as character set it just has to be a collation where name starts with latin1_. Feel free to use latin1_swedish_ci or latin1_german1_ci or latin1_danish_ci or whatever else we support. I would recommend to use latin1_general_ci or if you want a case sensitive sorting latin1_general_cs. These are the collation that are mainly for English.
[13 Feb 2009 16:01]
Tonci Grgin
Verified as described on WinXP Pro with both recent versions of c/ODBC against MySQL server 5.1.30 running on remote host. I used Access2003. I have no sound explanation for this.
[13 Feb 2009 22:07]
Graham Wideman
Tonci: Thanks for verifying the issue. However, you seem unsatisfied by the explanation (27-Oct-2008 3:22 "Further investigation")? Or is the challenge in trying to get beyond that? Susanne: You are discussing the alpha characters. The issue is not with those, it is with the characters like underscore, square brackets, braces and backslash, which, in Windows 1252 collation, sort lower than the alphas and numerics, versus seemingly all MySQL collations where those characters sort above the alphanumerics. That means that when Access/ODBC asks MySQL for ORDER BY results, and then internally depends on the result order to perform joins, this leads to failure to find all matches.
[8 Dec 2010 6:26]
Bogdan Degtyariov
Graham,
I looked into the general query log on the server and found the following set of queries executed sequentially to fetch the data for JOIN (this insane mixture of queries just does not make sense to me):
------------------------------------------------------------
101208 17:18:41 429726 Query SELECT `tb`.`AId` ,`tb`.`BId`,`tc`.`CId` FROM {oj `tb` LEFT OUTER JOIN `tc` ON (`tb`.`CId` = `tc`.`CId` ) }ORDER BY `tb`.`AId`
429726 Query SELECT `AId` ,`ta`.`AId` FROM `ta` ORDER BY `ta`.`AId`
429726 Query SELECT `AId` FROM `ta` WHERE `AId` = _latin1'aa01' OR `AId` = _latin1'aa01' OR `AId` = _latin1'aa01' OR `AId` = _latin1'aa01' OR `AId` = _latin1'aa01' OR `AId` = _latin1'a_01' OR `AId` = _latin1'a_01' OR `AId` = _latin1'a_01' OR `AId` = _latin1'a_01' OR `AId` = _latin1'a_01'
429726 Query SELECT `CId` FROM `tc` WHERE `CId` = _latin1'cc01' OR `CId` = _latin1'c_01' OR `CId` = _latin1'cc01' OR `CId` = _latin1'c_01' OR `CId` = _latin1'c_01' OR `CId` = _latin1'c_01' OR `CId` = _latin1'c_01' OR `CId` = _latin1'c_01' OR `CId` = _latin1'c_01' OR `CId` = _latin1'c_01'
429726 Query SELECT `BId`,`AId`,`CId` FROM `tb` WHERE `BId` = _latin1'b01' OR `BId` = _latin1'b04' OR `BId` = _latin1'b05' OR `BId` = _latin1'b06' OR `BId` = _latin1'b06' OR `BId` = _latin1'b06' OR `BId` = _latin1'b06' OR `BId` = _latin1'b06' OR `BId` = _latin1'b06' OR `BId` = _latin1'b06'
------------------------------------------------------------
I am setting the status to "Not a bug" because MS Access does not interpret the result of just one query. What it actually does is decomposing LEFT JOIN in a set of independent queries that do not look equal to the initial SQL.
[8 Dec 2010 9:59]
Graham Wideman
Hi Bogan, Yes, Access (or is it ODBC?) decomposes the query into several queries. I do not believe they are incorrect -- you can verify they are sensible by carrying out the test using data that contains only alpha or number characters -- this works. The problem is with data that contains a variety of non-alphanumeric characters, such as underscore. This almost guarantees that the issue is with the order of the data returned by one or more of the subsidiary queries. I discussed this in my comments of Oct 27 2008 to Oct 28 2008. The question I posed on Oct 28 2008 is still unanswered. Does MySQL have a collation order that matches Windows CP1252 US English? I could not find one, but if it does it is almost certain that running the query with MySQL set to use CP1252 order should fix the problem. (And also note that Susanne Ebrecht already followed up on the collation issue in Feb 2009, but incorrectly, which I addressed, and did not get any reply.) If MySQL cannot be set to use a collation order that matches CP1252, then queries from Access to MySQL will break seemingly randomly, depending on which decomposition Access (or ODBC?) uses. That, I would say, is a bug. So -- if MySQL has a CP1252 US English collation order, great, please let us know which one it is, and close the case. If not, then perhaps an issue could be filed "MySQL needs a CP1252 collation order so it can work with ODBC and Access" :-) -- Graham
[9 Dec 2010 6:50]
Bogdan Degtyariov
Graham, I can guarantee you that MySQL ODBC driver does not decompose LEFT JOIN queries. Adding ORDER BY is incorrect because it modifies the original query in such way that it becomes dependent on a collation order, which is not necessary identical to the collation order used by Access. This page you sent as the reference for CP1252 actually shows that underscore symbol is sorted above alphanumerics (you said it should sort lower): http://www.collation-charts.org/winxp/winxp.0409.CP1252.English_United_States.html
[9 Dec 2010 7:26]
Graham Wideman
Hi Bogdan, Clearing up a few points: > I can guarantee you that MySQL ODBC driver does not > decompose LEFT JOIN queries. No guarantee needed, I have not assumed any decomposition on the part of the MySQL driver. The decomposition appears in the log of ODBC communication, hence before it reaches the MySQL ODBC driver. What I do not know is whether the decomposition is in Access, Jet, or whatever code is used to transform Access SQL into ODBC SQL (that's what I meant by "ODBC?", and maybe I don't even have exactly the right model here). > This page you sent as the reference for CP1252 actually shows > that underscore symbol is sorted above alphanumerics When I said "lower", I meant "sorts as though it's a lower value", like "A" sorts as though it's a lower value than "B". (Or, in programming terms, "A" < "B", and "_" < "A") The chart you are referencing shows underscore sorting before "A", so I think that agrees with my point. (Sure, the values which I would call "lower", are located at a vertical position "higher" on the chart.) But the large point was that I could not find a MySQL collation order that matches this ordering. -- Graham
[9 Dec 2010 11:20]
Bogdan Degtyariov
Hi Graham, If I run the query inside another ODBC client (odbcte32.exe) the query is not decomposed. This means that MS Access actually does it. About the comparing rules for ISO-8859-1 you are right. In windows: "_" < "a" In MySQL "_" > "a" However, it cannot be the error because every system defines collations using own rules. For instance, Oracle CP1252 Generic is different from Windows XP CP1252: http://www.collation-charts.org/oracle10g/ora10g.WE8MSWIN1252.GENERIC_M.html strings like "a~1" will be sorted in the opposite way giving the same bad output with Oracle linked tables in Access: "~" < "a" in Windows CP1252-US "~" > "a" in Oracle CP1252-GENERIC The real problem is that MS Access is completely ignoring possible consequences if collations sorting is not equal to windows standards. MySQL server runs on a much bigger variety of platforms and it cannot use the platform-specific definition for a particular collation. Otherwise MySQL server in Windows and FreeBSD would give different results for the same SQL query on the same data set. I think you could try making a custom collation and add it to MySQL: http://forge.mysql.com/wiki/How_to_Add_a_Collation http://forge.mysql.com/w/images/b/b7/HowToAddACollation.pdf
[9 Dec 2010 12:09]
Graham Wideman
Hi Bogdan, Thanks for giving this another look. > The real problem is that MS Access is completely > ignoring possible consequences if collations sorting > is not equal to windows standards. Yes, agreed. It is assuming that if it requests sorted data, it will get sorted data, which is not entirely unreasonable. There is the capability to choose which collation we want the foreign database to use, but MySQL offers no choice (that I can see) that would fix this problem. However... > I think you could try making a custom collation > and add it to MySQL: ... Ah-hah! That could indeed be the solution! I wish I had known such a capability existed when I was working on this problem. Oh well... hopefully this bug entry and the trail of comments will help someone else in the same boat. -- Graham
[9 Dec 2010 12:14]
Graham Wideman
I see from the presentation slides: • A collation is added with help of subset of "Locale Data Markup Language" (LDML) http://unicode.org/reports/tr35/ • MySQL supports LDML starting from versions: 5.0.46, 5.1.20, 5.2.6 So as it turns out, custom collations were not added until after the version of MySQL I was obliged to use... so I didn't just narrowly miss solving this problem. Oh well, water under the bridge now.
[10 Dec 2010 3:35]
Bogdan Degtyariov
Graham, Even with the old version of MySQL Server it is still possible to make your own collation map, but this would require rebuilding mysqld binary executable.
[29 Apr 2013 5:25]
Bogdan Degtyariov
This is not a bug. The problem occurs because of collation differences in MS Access (Windows) and MySQL Server. The workaround to the problem is to make a custom collation.
