Bug #25224 | Comments in front of SELECT expression appear as part of column alias | ||
---|---|---|---|
Submitted: | 20 Dec 2006 20:31 | Modified: | 11 Dec 2007 0:15 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Parser | Severity: | S2 (Serious) |
Version: | 5.0.34-BK, 5.0.26-debug-log | OS: | Any |
Assigned to: | Davi Arnaut | CPU Architecture: | Any |
[20 Dec 2006 20:31]
Philip Stoev
[21 Dec 2006 16:32]
Valeriy Kravchuk
Thank you for a bug report. Verified with 5.0.34-BK on Linux: mysql> prepare bar from "select 1, /* COMMENT */ (SELECT 1)"; Query OK, 0 rows affected (0.01 sec) Statement prepared mysql> execute bar; +---+--------------------------+ | 1 | /* COMMENT */ (SELECT 1) | +---+--------------------------+ | 1 | 1 | +---+--------------------------+ 1 row in set (0.01 sec) Looks like mysql command line client just removes comment before sending it to server. Parser itself does not process inline comments correctly.
[28 Mar 2007 16:19]
Philip Stoev
It appears this has been known since January 2004 -- see the third comment at http://dev.mysql.com/doc/refman/5.0/en/comments.html
[6 Sep 2007 16:45]
Konstantin Osipov
Fix in 5.2 only.
[10 Dec 2007 16:57]
Davi Arnaut
The standard says that for select-list columns that "that do not specify an <as clause>" (i.e. are not followed by "[ AS ] <column name>"), the name is "an implementation-dependent <column name>". In other words, MySQL can form the column name using whatever means it thinks most convenient. Although it's possible to remove comments from the columns, it needs to be done at lexical analysis time because we would need to concatenate each valid token into the column name. For example, parsing the query: SELECT 1, /* COMMENT */ (SELECT length("str /* COMMENT */ str")); would yield the the tokens (, SELECT, length, (, "str /* COMMENT */ str", ), ) for the last column. Such concatenation scheme is the only viable option but would significantly increase the lexical analysis cost and requires precise tracking of the start and end position of each token -- precise tracking that the current lexer doesn't offer. Since historically the server has preserved the comments and given the burden required to properly remove comments, I think that's not worth changing.
[10 Dec 2007 17:49]
Philip Stoev
Can you please at least move this to "To be fixed later" -- there are people out there and GUI tools which rely on the column names mysql generates, and they should be reliable. It is fairly easy to cause this bug to show up -- just write a really nice comments to a normal query and your application may break. I will really not advocate any solution with an O(n^3), however the mysql client is successfully able to strip comments without analysing the query. I think this bug can be solved for most cases by simply making sure that the lexer advances its "start of token" pointer so that leading comments are excluded.
[10 Dec 2007 18:05]
Davi Arnaut
What stripping comments has to do with "reliable" column names? If one needs a "reliable" column name it should be using the AS. This is not a bug (maybe a feature request), nowhere it's stated that comments must be stripped. The msyql client stripping has caused problems in the past and is probably just a hack, the stripping can only be reliable done with the help of the lexical analyzer. For example, consider stripping the following query: SELECT 1, /* COMMENT */ (SELECT /* COMMENT */ length /* COMMENT */ (/* COMMENT */ "str /* COMMENT */ str" /*COMMENT */) /* COMMENT */);
[10 Dec 2007 21:15]
Philip Stoev
Yes I agree that a person requiring a specific column name should use the AS clause. However, at the same time, the user is entitled to expect that adding extra comments to a query will not change its behavoir (comments can even be added by third-party tools without user intervention). I agree that the standard leaves the column name to be implementation-dependent, however the implementation is still responsible to use intuitive names. If the name contains a comment, this means that it will contain / , * and spaces, which are not good characters to have in a column name. I assumed that as long as you can keep track whether you are in a string literal or not, you can process the string and skip over comments. If this is not the case, let's at least leave this bug as "will fix later" so that it is taken into consideration during a future rewrite of the lexer/parser.
[11 Dec 2007 0:04]
Davi Arnaut
Comments don't change the query behavior, it's just that historically MySQL API allowed that result sets to be accessible using the field/column name and I tend to think that intuitively the comments need to be preserved, otherwise we might complicate things too much. For example: SELECT 1, (/* COMMENT */ SELECT 1 /* COMMENT */ + 1) Stripping the comments, this column name would be "( SELECT 1 + 1)". Furthermore changing the comment position would yield a different column name, complicating things even more from a maintenance point of view. Historically, MySQL doesn't do transformations on the column name, a "SELECT 1 + 1" would yield a column name that is exactly the same. I think this is good design because it's based upon the principle of least astonishment. It also makes things harder for a programmer, if you use a variable to build the SELECT item/column and it has a comment, you would need to strip the comment to be able to use the same variable as a index for the result. Stripping the comments might make sense if you are the one making the query but not evaluating the results -- following this scenario, a naive DBA could easily break a client application by changing the position of a comment (due to spaces, etc). Implementation details aside, I fail to see any way in which this could make things easier. It's not that I'm avoiding to work on a proper "fix", in the long term I see more disadvantages than advantages.