Bug #50552 | ndb_size.pl fails to calculate utf8 fields correctly | ||
---|---|---|---|
Submitted: | 22 Jan 2010 17:18 | Modified: | 16 Apr 2012 11:52 |
Reporter: | Patrick Mulvany (OCA) | Email Updates: | |
Status: | In review | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S3 (Non-critical) |
Version: | mysql-5.1-telco-7.0, 5.5.20-ndb-7.2.5 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | 7.0.9, Contribution |
[22 Jan 2010 17:18]
Patrick Mulvany
[26 Jan 2010 15:21]
Patrick Mulvany
Patch to add --utf8 switch and warn of rows too long for NDB engine
Attachment: ndb_size.pl.patch (application/octet-stream, text), 5.46 KiB.
[1 Feb 2010 10:22]
Sveta Smirnova
Thank you for the report. Verified as described.
[5 Feb 2010 15:42]
Lenz Grimmer
Hi Patrick! Thank you very much for your patch contribution. In order for us to accept you patch, We have to ask you for one small favour - could you please send us a signed and filled out copy of the Sun Contributor Agreement (SCA) as outlined on this page? http://forge.mysql.com/wiki/Contributing_Code#Paperwork You will only have to do this once and it's valid for all other Sun-governed Open Source projects as well. Please let me know, if you have any questions or concerns about this! about this. Thanks!
[10 Feb 2010 12:09]
Patrick Mulvany
SCA sent on 8th no idea how long these take to process ;) When I get time (isn't it always) I will do a more complete patch covering double byte characters and per table/field as well. Probably using something like :- select TABLE_SCHEMA,TABLE_NAME,COLUMN_NAME,CHARACTER_SET_NAME,COLLATION_NAME from information_schema.columns where table_schema not in ('mysql','information_schema') and CHARACTER_SET_NAME is not null;
[11 Feb 2010 11:41]
Patrick Mulvany
Updated patch to add --default-character-set=utf8|utf2|utf16 option, now detects column charsets, removed defunct --uft8 option
Attachment: ndb_size.pl.patch (application/octet-stream, text), 6.68 KiB.
[11 Feb 2010 14:12]
Patrick Mulvany
Full fix handles any character set pulling length from information_schema with backout using fixed values for utf8,ucs2 7 utf16
Attachment: ndb_size.pl.patch (application/octet-stream, text), 6.97 KiB.
[24 Mar 2010 11:38]
Patrick Mulvany
SCA accepted ;) https://sca.dev.java.net/CA_signatories.htm#m Phew that took a little while to get sorted. Paddy
[25 Mar 2010 11:20]
Frazer Clement
Hi Patrick, Thanks for the bug report and patch. I'll have a look at ndb_size.pl and your patch today. Thanks, Frazer Clement
[26 Mar 2010 18:08]
Frazer Clement
Hi Patrick, Sorry for the delay in responding, I was not that familiar with Perl or ndb_size.pl previously! I've reviewed your patch and it looks good. I've suggested a few changes in a new patch, which I will attach to this bug report. Please let me know what you think, any mistakes etc. Once we've got a version we are both happy with then I will commit it to mysql-5.1-telco-6.2 and upwards. Please get in touch with any questions / issues. My email contact is frazer@mysql.com. Thanks, Frazer Changes suggested in new patch : - Added other 'bugs' to the comments list - Removed change of shortcut -h from help to hostname - Renamed $infoschema to $using_infoschema - Removed capitalisation of information_schema - Added warnings when default character set mappings are being used (Note that no per-column charset info is available if the information_schema is unavailable) - Moved charlen * size calculation out of type-specific switch - Added $lenbytes variable - Added separate 'Max bytes/row' sum, which is used for rowlength checks. (Per version, for potential future scenario of different limits / version)
[26 Mar 2010 18:09]
Frazer Clement
Modified version of latest patch
Attachment: bug#50522.patch (text/x-patch), 9.23 KiB.
[6 Mar 2012 17:41]
Chris Miller
When will this patch be committed? It's been two years, just sayin... I'm fortunate enough to have found this searching the interwebs, and utf8 is more the rule than the exception these days.
[16 Apr 2012 11:52]
Patrick Mulvany
Currently working on an updated patch that fixes a few more issues bring this more up to date as it has been a bit stagnant. Currently proposed patch does not handle set and per column charecter sets correctly.