Bug #40075 | Non-descriptive error message when restoring on Falcon with open transactions | ||
---|---|---|---|
Submitted: | 16 Oct 2008 11:28 | Modified: | 7 Nov 2008 10:23 |
Reporter: | Øystein Grøvlen | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Backup | Severity: | S3 (Non-critical) |
Version: | 6.0 | OS: | Any |
Assigned to: | Ingo Strüwing | CPU Architecture: | Any |
[16 Oct 2008 11:28]
Øystein Grøvlen
[16 Oct 2008 11:39]
MySQL Verification Team
Thank you for the bug report. Verified as described.
[7 Nov 2008 10:18]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/58156 2727 Ingo Struewing 2008-11-07 Bug#40075 - Non-descriptive error message when restoring on Falcon with open transactions Not to be pushed. This is a demo patch. It shows, what could be changed to receive a better error message in this case. This patch reverts a couple of design decisions. It should not be pushed without approval from the architecture team. For more information see the bug report. As you can see, the changes do not always improve the error messages. In backup_views it became worse. Definitely. In ndb_single_user, it is arguable. In restore_obstacles it became better.
[7 Nov 2008 10:23]
Ingo Strüwing
This is not a bug. It works as designed. If you really want it changed, please ask our architects to revise the design decisions. The below text may help to understand which decisions are required. The storage engine, Falcon in this case, returns an error code from handler::ha_delete table(), which tells that the table cannot be deleted because of ongoing transactions on this table (ER_LOCK_OR_ACTIVE_TRANSACTION). This error code is blocked (cleared) at multiple places and/or replaced by less informative error messages. Even if one or the other blockade could be seen as a "mistake" or "bug", the mere existence of multiple blockades suggests that this is an explicit design decision. The problem with MySQL server error reporting is the diversity of error reporting facilities. Several fuctional blocks have their own facilities. Some use object oriented methods, others are plain fuctional. Some build on thers, while some are independent. This might come from the fact that there are at least two paradigms for error reporting: Paradigm #1: The first time, an error is detected, do immediately report it with as much detail as possible. This will mostly be quite low level information. But in most cases it names the real cause of the problem. Higher level functions in the call hierarchy can add information that tells the user, which operation was going on when the error happened. Paradigm #2: The highest level function reports that it cannot do its job. This should give the most easy to understand message to the user. More detailed information could be added on request. Examinig the code and talking with MySQL developers reveals that different opinions exist, which paradigm to use. No wonder that the error reporting works pretty heterogeneous in MySQL. The inital error reporting facility seems to be influenced by paradigm #1. There exists a "warning stack". The first report entered on this stack will be sent as the error message to the user, if the statement results in a non-zero return value. The other "stack" entries can be retrieved by the "SHOW WARNINGS" statement. Since this facility is implemented in the networking module, which sends results to the user, most, if not all other error reporing facilities ar based on this one. Developers, who prefer paradigm #2, need to block the error messages that have been reported by lower layers, and replace them with their own message. This blocking happens by clearing the "warning stack" (thd->clear_error(), or somewhat less "clean" thd->main_da.reset_diagnostics_area()). After this, a new error message becomes first in the stack and will thus be reported to the user. Unfortunately, this approach wipes out all stacked error reports. Friends of paradigm #2 probably would like to be able to enter new messages as number one on the "stack". But we don't have this ability. Back to the reported problem of RESTORE. These are the obstacles that prevent the engine's error code to come to the user's attention: 1. The global function ha_delete_table() is a wrapper for handler::ha_delete_table(). Its purpose is to provide handler::ha_delete_table() with a TABLE object and to suppress error reports from the engine. ha_delete_table() has a parameter "generate_warning". If this is true, an error code from the engine is pushed as a warning on the warning stack. If it is false, no error handling happens at all. Since the engine does not push to the warning stack itself, no report is done at this stage. A comment in ha_delete_table() suggests that it is explicitly not wanted that an error is reported if the deletion of a table fails in the engine. IMHO this is very well justified, if the error code from the engine says that the table does not exist in the engine. But every other error means that the table could not be deleted and continues to exist. IMHO this is vital to be reported. But since the code is very explicit, this should not be changed without approval from the architects. 2. In the DROP TABLE code (mysql_rm_table_part2()), we call ha_delete_table() with generate_warning=FALSE. So we do not even want to see warnings if a table could not be deleted. However, we collect tables with problems (other than table does not exist) in a list and later report this list together with the error code ER_BAD_TABLE_ERROR, which translates to "Unknown table ...". While generate_warning=FALSE is a design decision, the error text could be fixed by remembering if any error other than non-existence happened. If so a new message like "Cannot drop ..." could supersede the other message. However, this would be unnecessary if we decide to fix 1. In that case all other errors would have been reported already. 3. In the code that tries to restore a database (DatabaseObj::do_execute()), we do first try to drop the same database, but ignore its return code. We just try to create it anew. This wipes out the above error and replaces it with something like "Canot create ...". I guess the implementor expected nothing but a "does not exist" error, and felt safe to ignore it. But since non-existence is masked out on the lower layer already, this is probably just a mistake and can be fixed. 4. In the backup/restore kernel (bcat_create_item()), we check the return code from the [re-]creation of the database object. If it failed, we report an error like "Could not restore database". Since DROP DATABASE and CREATE DATABASE are executed in sub-statements, the query_id becomes incremented. And now comes a design detail of the original error reporting facility, that I do not understand. THD remembers the query_id of the first error in the warning stack (thd->warn_id). If a new error is reported with a different query_id, the stack is cleared, wiping out all former error reports. This looks like a lapse into paradigm #2. If the architecs decide that this shall stay as it is, we could still fix the obstacle by reporting a new error only if no error exists or query_id is unchanged (if (thd->is_error() && (thd->query_id != thd->warn_id)) in bcat_create_item(). 5. In the backup/restore kernel (Backup_restore_ctx::do_restore()), after handling/[re-]creating all meta data, we explicitly clear the diagnostics area (wipe out all error reports). This has been added recently with revision oystein.grovlen@sun.com-20081014120856-0k6zhjkekl2xcybb for BUG#39089/WL#4384 (Fix missing error handling in backup code). So this is a design decision by the backup team and could be revised by them. The above patch shows how an implementation could look like if these design descisions are changed.