Bug #37514 | Insert of Blob data, sometimes the BLOB is corrupt. | ||
---|---|---|---|
Submitted: | 19 Jun 2008 5:36 | Modified: | 25 Jun 2008 2:35 |
Reporter: | Paul Rolfe | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Query Browser | Severity: | S2 (Serious) |
Version: | 5.0.51a-community edition | OS: | Linux (Fedora Core 6) |
Assigned to: | Heikki Tuuri | CPU Architecture: | Any |
[19 Jun 2008 5:36]
Paul Rolfe
[19 Jun 2008 5:37]
Paul Rolfe
File that was read back after insert
Attachment: MYSQL_FAULTY_FILE.BIN (application/octet-stream, text), 101 bytes.
[19 Jun 2008 5:38]
Paul Rolfe
File I inserted. This is what I originally inserted, and comapred to
Attachment: MYSQL_CORRECT_FILE.BIN (application/octet-stream, text), 101 bytes.
[19 Jun 2008 5:39]
Paul Rolfe
Binary Log from the server. Note : The relevant insert is only the last statement.
Attachment: MYSQL_BINLOG_BUG.SQL (text/plain), 6.45 KiB.
[23 Jun 2008 13:41]
Heikki Tuuri
Paul, please check with the mysql.exe interactive SQL client what it returns as column values. It would be surprising if simple BLOB inserts and SELECTs in InnoDB-5.0 would cause corruption inside InnoDB. More likely is that some setting in your client does not understand the character set. If you use MyISAM, do you see 'corrupted' BLOB values? Regards, Heikki
[23 Jun 2008 16:02]
Susanne Ebrecht
Many thanks for reporting a bug. I think your problem isn't a bug. It looks more like an encoding issue. http://dev.mysql.com/doc/refman/5.1/en/charset-conversion.html "If the column has a binary data type (BINARY, VARBINARY, BLOB), all the values that it contains must be encoded using a single character set (the character set you're converting the column to). If you use a binary column to store information in multiple character sets, MySQL has no way to know which values use which character set and cannot convert the data properly."
[23 Jun 2008 16:58]
Heikki Tuuri
I agree with Susanne that probably this is not a bug.
[24 Jun 2008 0:23]
Paul Rolfe
The data which I am inserting is pure binary, and no character set should be applicable. For Instance I was inserting an .EXE file or a .JPG. I do not see how a character set should apply to pure binary data. Additionally, I would like to add is that this is intermittent. I inserted around 80 files, and about 30 of them were affected. I deleted the rows, and reinserted them, and they were OK then. Something like a character set problem, I would expect to be consistant. I intially thought, and were leaning towards the problem being in mysql query browser until I saw the binary log. How would a colomn be configured to store files such as EXE files?
[24 Jun 2008 6:31]
Susanne Ebrecht
Paul, at the moment your default character set is latin1. For your blob column you want to have the character set binary. Also you should make sure that you insert the data as binary and will get them out as binary. You can try to make: CREATE TABLE t(b BLOB CHARACTER SET binary); Also you have to take care that you use the same encoding for inserting the data and getting them out (select). SHOW VARIABLES LIKE '%char&'; will show you the character set settings. Interesting for you are character_set_client, character_set_connection and character_set_results. You have to set these values to the client encoding what you are using. Means on Linux for example: You are using the CLI in a KDE Terminal. Then you first have to look which encoding your terminal is using. On newer Linux it is default utf8. Let's say all character sets in MySQL are latin1. If it is utf8 then you have to make first in CLI: SET NAMES UTF8; This will change the important character set variables to UTF8. If you don't use a terminal usually other software is taking the encoding from the locale of your system. You can ask your system here by using: $ locale Windows is a little bit more tricky here. To figure out, which encoding your Windows system is using you have to open a command window and type: CHCP When the result is for example: code page 850 then you have to do first on your database session: SET NAMES CP850; Let me try to explain the background: The variable character_set_client will tell the MySQL server which client encoding (means the encoding from your system/software that you are using) the user is using. So if you are using another client encoding then the value of the variable then you should change this variable. If you do this, the MySQL server knows: "ahhhh, my client is sending i.e. koi8u data". With this knowledge the server will look which character set the column in the table will expect. Let's say it's utf8. And now it will convert all input data from koi8u (Ukraine encoding) to utf8 and will store the data right in your database. When you select data of course the system also needs to know in which encoding you want to get out the data. There for it needs the variable: character_set_results. Here the MySQL server just looks: which character set is used for storing the data? which character set wants the user back? and then it will convert the data into the character set that you want back and will display the data by using that character set. More about the blob topic you can find here: http://dev.mysql.com/doc/refman/5.0/en/blob.html http://dev.mysql.com/doc/refman/5.0/en/string-functions.html
[24 Jun 2008 6:35]
Susanne Ebrecht
Paul, sorry, I made a typo (the '%' and the '&' key are next to each other on my keyboard): Here is the right command to get the character set variables: SHOW VARIABLES LIKE '%char%';
[24 Jun 2008 12:36]
Heikki Tuuri
Classifying this as a MySQL Query Browser issue and unassigning myself from the bug report.
[25 Jun 2008 1:07]
Paul Rolfe
Susanne, Thankyou for your replies. Unfortunately, I do not agree. Quoting the Mysql 5.0 manual "BLOB columns have no character set" from section 10.4.3. The BLOB and TEXT Types I can not apply a character set to this column. This does not seem to be the problem. If I were using the TEXT derivitive of columns I would expect such trouble. Additionally, quoting section 12.1.5 of the same manual, under the create table syntax, they list the options for each data type, and TEXT has character set, but not BLOB. The only mention of a character set is the default at the end of the table definition. ( ENGINE=InnoDB AUTO_INCREMENT=209 DEFAULT CHARSET=latin1;) I wish to have my varchar encoded in latin1, to be uniform with other tables in my dbase. Executing your sample command on mysql and 5.0.48 5.0.51 both produce errors. CREATE TABLE t(b BLOB CHARACTER SET binary); produces the following error You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'CHARACTER SET binary)' at line 1 This still smells like a bug somewhere. Using Mysql's off the shelf database, and Mysql's off the shelf grahical client, some part is breaking the storage of files. My comment also about it being inconsistant, may need to be considered, as it does not always fail, and the same data will fail one time, then be OK the next. I would expect consistant results all the time, given the same input parameters. I will try to make a fresh database on another server, start a new binary log, log the entire process from start to end, and demonstrate this problem. Based upon my Binary logs, I have discounted a query browser bug in my mind. Regards and thankyou for your help, Paul Rolfe
[25 Jun 2008 2:35]
Paul Rolfe
Susanne, Time for me to eat some of my words. I was partially mistaken. I made a script to try and replicate the issue I found. It seemed to work all the time without problems. Then I looked back at my binary log, earlier posted, and noted the character sets being changed. I created a new dbase and mucked around with various combinations of CLI scripts, making query broswer insert etc... What I found was that query browser when working correctly, adds this into the binary log. SET @@session.character_set_client=33, @@session.collation_connection=33, @@session.collation_server=8/*!*/; When Query browser decides was not working it was doing this. SET @@session.character_set_client=33, @@session.collation_connection=8, @@session.collation_server=8/*!*/; The function show character set does not give the "index" number, so I have no idea what 8 or 33 means. I could not make my query browser do this on my test server/client. However the server which produced the original post is very busy, doing up to 300 inserts/updates per second. Many of which are transactions. The database which showed to problem is not involved in transactions, also not used by clients doing transactions. MySQL server seems to "Preen" the character set when putting it in the binary log. ie if I set only one of the variables, the binlog shows all 3. With that in mind, I can not say whether Query browser is actually setting the collation_connection variable. Looking into the faulty file, seeing all of those question marks, tells me now that the server was trying to apply a character set. I think the char set was being applied to the SQL statement itself, not the field. A Blob does not have a character set as I found in the manual. I have attached my playing script for your reading pleasure. You can execute it directly, as there are no dependancy issues.
[25 Jun 2008 2:36]
Paul Rolfe
SQL script I was using to test char sets
Attachment: anttest.sql (text/plain), 1.24 KiB.