MySQL Bugs: #4541: "Specified key was too long; max key length is 1000 bytes" with utf8

Bug #4541	"Specified key was too long; max key length is 1000 bytes" with utf8
Submitted:	14 Jul 2004 1:13	Modified:	3 Oct 2008 20:46
Reporter:	Björn Dieding	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: General	Severity:	S4 (Feature request)
Version:	4.1.2	OS:	Any (Any)
Assigned to:	Assigned Account	CPU Architecture:	Any

Description:
Message:
#1071 - Specified key was too long; max key length is 1000 bytes

I think this error should be handled different. eZ publish and Mambo server seems to have troubles with this error message, because users may switch from e.g. latin1 to utf8. I bet some other known application will run through this issue too.

And when they do they can't use the recommend db layout anymore, because mysql doesn't allow to store more then 1000 bytes as a key. 

latin1 = 1 byte = 1 chararcter
uft8  = 3 byte = 1 chararcter

How to repeat:
http://ez.no/community/bug_reports/mysql_error_1071_with_utf8_db_on_kernel_injection

Suggested fix:
Please make the calculation not based on bytes. Please make it be based on the current charset used. So that we can actually store make then 333 chars when using utf8.

No, what you really ask for is to allow UNIQUE keys longer than 1000 bytes - and this is in our TODO.

"making the calculation not based on bytes" will only result on error message
"Specified key was too long; max key length is 333 characters" which won't help eZ.

I'm changing the status to "deferred" as there's nothing more we can done, until this TODO item gets implemented

This is still a problem in 4.1.9, 6 months later.

Also MythTV has problems with that. I think many more applications will follow.
A solution must be found quickly! For example increasing the limit to 3000bytes for UTF-8. Otherwise you only can use somehow short key names IMHO, which remembers me on times where variables got very short names and nobody except the programmer can understand what he does... :-(.

SQL-Query was:
CREATE TABLE IF NOT EXIST jumppoints ( destination varchar(128) NOT NULL default ' ', description varchar(255) default NULL, keylist varchar(32) default NULL, hostname varchar(255) NOT NULL defualt ' ', PRIMARY KEY (destination, hostname));

Yes we also have the same error with the Jahia application (Content Management System)
- with Mysql NT 4.1.9 nt
- caracter set UTF 8

while executing :

CREATE TABLE jahia_grp_prop (
id_jahia_grp mediumint(9) NOT NULL,
name_jahia_grp_prop varchar(255) BINARY NOT NULL,
value_jahia_grp_prop varchar(255),
provider_jahia_grp_prop varchar(50) NOT NULL,
grpkey_jahia_grp_prop varchar(50) NOT NULL,
PRIMARY KEY (id_jahia_grp,  name_jahia_grp_prop,
provider_jahia_grp_prop,  grpkey_jahia_grp_prop)
);

Francis

Is there any short term fix suggestions ? I'm trying to install the cvs version of mambo..... thanks?

Not that I know of. I even doubt that it is an efficient database design, which uses giant key sizes. Anyway, did you try to use a different storage engine, e.g. InnoDB?

I have encountered this problem as well when I wanted to support utf-8 data. At first the error message wasn't clear, but a little digging made me realize that it was stating "bytes" instead of characters. 

Turns out that I really didn't require the primary key columns to be utf-8, so I used "CHARACTER SET latin1" on those those columns that would be part of primary key and no longer worry about it. However, I needed to submit a memo to our development team to make them aware of this issue if they have multiple column primary keys that are in a table with a default character set of 'utf8'.

My take (obviously an en-US point of view) is that I would not sacrifice performance to support utf-8 keys exceeding 333 bytes. I usually try to keep key size down so that indexes perform optimally, so for me, primary key values can be latin1 and less than 1000 bytes so I'm not too concerned about this. I do think that the error message could be a bit more informative to make the relation between character set and byte-count more clear.

Since this bug was submitted on July 14 of 2004 and it has been 10 months that it has not been resolved, is there any way we can assist the person responsible to maybe try and fix it? 

We are trying to install Mambo 4.5.2.1 or 5.0 with utf-8 (we need to use 3 languages simultaneously, En,De,Gr), but it is impossible, without messing up the keys on the tables, and of course without knowing the further consequences this will have when adding modules or components. Also in my humble opinion, the severity of this bug should not have been marked as a 'Feature request', but as serious.

Also compare http://bugzilla.wikipedia.org/show_bug.cgi?id=1322

The problem with UTF-8 is that every character reserves 3 bytes for the worst case. The key cl_sortkey(cl_to,cl_sortkey(128)) uses 255+128=383 characters == 1149 bytes.

In this case one could define KEY cl_sortkey(cl_to(213),cl_sortkey(128)) or KEY cl_sortkey(cl_to,cl_sortkey(86)) or something.

Still a problem in 4.1.14

Well for eZ publish we have fixed this issue now by lowering the amount of chars stored in the key....

Still this is a limit that other Databases(mssql, oracle) do not have

Still a problem in 5.0.16

mysql> CREATE TABLE IF NOT EXISTS jumppoints (  destination varchar(128) NOT NULL default '',  description varchar(255) default NULL,  keylist varchar(32) default NULL,  hostname varchar(255) NOT NULL default '',  PRIMARY KEY  (destination,hostname));
ERROR 1071 (42000): Specified key was too long; max key length is 1000 bytes

I use Mysql server version: 5.0.15-nt-log, still has this problem.

This is still a problem in 5.0.21 macosx x86.

unrelated related info: SCORM specification has Minimum Container Permisable (MCP) of 255  for many ID strings, some are larger, I won't add boring details of implementation realities with those constraints. This bug does not help, nor are assumptions that such keys are 'bad' .... 

I vote for a fix.

This bug occurs on 4.1.13 with InnoDB tables.  Causes MediaWiki v.1.6 to fail.

Will someone please tell us if MySql are going to fix this bug?

I agree that it reflects fairly dim design in a number of web applications, but it's a real showstopper for easy deployment of them on several systems.

If you do plan to change this please tell us. Particularly please tell us if you don't plan to and we can deploy Postgres instead...

We surely want to fix this. But it still has to wait. We are currently fixing a lot of bugs.

We think that a limited key length is not a bug, but a limitation. So we take this bug report as a feature request. Hence it has lower priority than real bugs.

At least your complaints make this limitation raise in the list of features to implement.

You might be able to work aroud this limitation if you convert the character set of the table to latin1. I know, this is not always possible. :(

(My specific error is "765 bytes", using Win32 MySQL 5.0.22-community-max-nt)

If I understand this limitation (``bug'') correctly, the problem is that storing the indexing data is limit to ``x'' (255 in my case) chars. In UTF-8 a ``char'' can be up to 3 bytes long, and the limit is to avoid having too much wasted space/indexing data too big to be efficient, is this correct?

I definitely vote to bump this higher on the priority list.

For now, I'm breaking the data into chunks of 255 chars each, and storing it in multiple columns. Obviously, this is not very efficient, and it significantly complicates locating a specific record (although I've added an MD5 hash of the whole string for fast location *IF* I know the exact value I'm looking for..).

Has anyone figured out a better temporary workaround in the meantime?

PS: If this helps anyone... I haven't tried it myself, yet, but it has also been suggested that I store the data in an encoded form similar to that used on the web for URL encoding, modified to handle UTF-8: Chars with ordinals 0-127 are plain ASCII/Latin1 (except the escape char, ``%'' in URL encoding, and other ``special'' chars must be escaped), and any higher ordinals are UTF-8 encoded using two or three hex code sequences. IE: the code for a dagger symbol (``†'') is (in Unicode hex) 0x002020, which would be encoded as plain ASCII text string of ``%E280A0''. Then you can store the data in MySQL using encoded plain ASCII/Latin1 charsets. (Look up ``UTF-8 encoding'' on the web if you don't understand how to do UTF-8 encoding. Perl and PHP (IIRC) also have functions for handling strings encoded in this, or at least similar, ways.)

Hi.
You write "IE: the code for a dagger symbol (``†'') is (in Unicode hex) 0x002020, which would be encoded as plain ASCII text string of ``%E280A0''."

If you put this in a CHAR(255) CHARACTER SET LATIN1 column, you can get a maximum of 255/7=36 dagger symbols into the column. This seems less efficiently encoded than UTF-8.

UTF-8 stores ASCII characters as single bytes too. It use two-byte codes for many other characters and three-byte codes for the rest. The dagger symbol would be encoded as three bytes and you could store 255/3=85 of them in a CHAR(255) CHARACTER SET LATIN1 column. But you can store 255 of them in a CHAR(255) CHARACTER SET UTF-8 column.

The problem that we have is that we must *guarantee* that the selected number of characters of the selected character set can be stored in a column. That is, a CHAR(255) CHARACTER SET UTF-8 column *must* be able to store 255 characters where every single one can be three byte encoded. This means that we must provide 255*3=765 bytes of storage.

If one uses VARCHAR, this is less of a problem for the data file. The records can expand up to 64K bytes in length. But the keys in the index must be expanded to the full size. Otherwise comparisons would not work efficiently.

Regards

Hi Ingo,

You wrote: ``If you put this in a CHAR(255) CHARACTER SET LATIN1 column, you can get a maximum of 255/7=36 dagger symbols into the column. This seems less efficiently encoded than UTF-8.''

That was my error -- The guy I was talking to was explaining how it works for URL encoding. For data in a database, it should work by encoding the data in binary rather than text. So ``%E280A0'' would actually work out to be ``â€ '' (ASCII codes 0xE2, 0x80 and 0xA0, respectively) -- Using three chars (max) instead of seven. This results in up to 333 dagger symbols maximum -- the same max enforced by MySQL. Nothing is lost or gained. However, how many people will be putting 333 daggers into a database? (Well, some people might want to put a *real* dagger into it in frustration! ;-)) More likely, the majority of text will be plain ASCII/Latin1, which requires only one byte per char, packing up to 1000 chars into the index -- Although it cannot be *guaranteed*.

Worded a bit differently.. By doing my own UTF-8 encoding, I can reliably index strings up to 333 chars, and *unreliably* push the limit in many cases up to an absolute max of 1000. This is something MySQL cannot do, since it has to go with the guaranteed/reliable value of 333 (the lowest common denominator). My app/client/sql code, on the other hand, isn't bound by such constraints; In the data I was using when I encountered this error/bug/problem, around 15% (out of 34,000+ records) of the data exceeded the 333 char limit. Of that 15%, seven entries were not unique within the first 333 chars, but were unique at some point after 333 chars, making it impossible to use a UNIQUE index on the data.

In contrast, by ``packing'' the UTF-8 data in my app/client and using Latin1 in the database, all the data fit within the 1000 byte index limit, with an average size around 300-400 bytes, and allowed me to make a UNIQUE index on the column.

This isn't an ideal solution, it's just a temporary fix to get more out of how MySQL is implimented. Nor will it work for everyone: An app/client that uses a lot of Unicode may have no choice but to stay within the 333 char limit. Likewise, it's not very useful to pack the UTF-8 data in a column that's not being indexed, since the limit is only present in the index, not the data storage.

But every byte that can be saved when possible, is a byte that can be used elsewhere..

I think one solution may be HASH indexes.  There are many situations where BTREE indexes are far too expensive for large values.

Note: For those of you having troubles with Mambo server, note that Mambo will create not only inefficient indexes but will duplicate them too, causing double overhead.  You should report this as a bug to Mambo, not MySQL... I have done so but apparently they don't fix the problem with their poor schema design.

I have run up against this limitation trying to port an application that currently runs against - Oracle, SQL Server, PointBase, DB2 or Sybase to MySQL 5.0.  I would like to understand MySQL's current position on this Feature Request.  Since I am trying to port an existing application that already works across several other databases to MySQL having to change the application/database design because of a MySQL limitation is not very feasible. 

For a database defined with the default CHARACTER SET of latin 1:
12:06:29  [ALTER - 0 row(s), 0.000 secs]  [Error Code: 1071, SQL State: 42000]  Specified key was too long; max key length is 767 bytes

For a database defined with the default CHARACTER SET of UTF8:
drop table t1
;
create table t1 (c1 varchar( 256)  not null)
engine=InnoDB;

ALTER TABLE t1
       ADD CONSTRAINT PK1 PRIMARY KEY (c1)
;
12:16:55  [ALTER - 0 row(s), 0.000 secs]  [Error Code: 1071, SQL State: 42000]  Specified key was too long; max key length is 765 bytes

I was trying to store unique URLs, which have a theoretical maximum length of 2048 characters in a MySQL field. Since this didn't work, I now have to restrict URL length to 255 characters, which is an older standard.

As to Mambo users, most open source CMS are full of hacks and try to be usable on as many DB-s, as possible. Poor by design, unfortunately.

Hola,

als trying to install Mediawiki for a multi-language platform. The problem really sucks.

Change MI_MAX_KEY_LENGTH in myisam.h  to value  required can solve 
"Error: 1071 Specified key was too long; max key length is 1000 bytes".
Solved mine try it and let me know any complications involved or new problems evolved.

> In UTF-8 a ``char'' can be up to 3 bytes long

Wrong. In utf-8 character can take up to six bytes.

I've come across this problem when trying to IMPORT data backed-up on an existing MySQL database.

I'm trying to shift data from one MySQL DB by exporting it to a .sql file, then importing it on another PC into it's new home.

Only to be stopped by this damn 'error'.

You'd think that if this was a 'limitation' then you wouldn't have it EXPORTING data in a format you can't IMPORT.

Any ideas?

I understand why long string keys would be considered bad database designs, but there are some cases where I don't really find a better solution.

For instance, Amarok uses MySQL (and mysqlite) to build its database, and one of the thing it holds there are metadata about the music files. Among other things, it makes sense to have a UNIQUE constraint on the column with the file path. It also needs sorted indexes for metadata, and that can often be very long (ie, most song and album names are short, but I have examples in the database of titles as large as 133 characters, and imagine what the file URL could look like when combining several such strings in a deep path.)

In practice I've just tried importing a database (UTF8, varchar(1000)) and for some reason the truncating came around the 64 character mark. (That might be something stupid on my part, please enlighten me if that's the case.)

I don't really see a better solution for this case. The application can drop the database constraint and do the "unique" checking by itself (sorting too), which is as bad design as I know, or it can keep a pair of truncated/full columns, which is quite nasty, or it can split a column in several and use a complex index, which is even nastier.

Is there something I'm missing?

On the other hand, does the following suggestion make sense?

1000 utf8 characters take up at worst 3kB, but some strings (most, for some applications) will fit in about 1kB.

So, for a "VARCHAR(1000) CHARSET UTF8 UNIQUE" some of the strings won't fit in the index, but many will. Would it be possible to refuse to INSERT such strings, rather than the creation of the constraint?

This way, strings with 400 Hangul characters that don't differ between the first 300 will be refused as not unique, but Latin strings will be accepted. This seems to make more sense than just refusing everything, unless I'm missing something critical.

Again. I'd like to work on a solution for this, but higher priority tasks prevent it.

If I were you, I probably would try to have the "path" column un-indexed and add a new UNIQUE column, which I would fill with MD5 checksums from the "path" column. I have heared from cases where MD5 checksums showed collisions with path names, but never saw a proof. But even if one is sceptic one can find ways to get around this problem.

Refusing a string of 999 characters into a VARCHAR(1000) is a critical bug IMHO. For this we should have something to tell at table creation that one accepts shorter limits in some cases (e.g. VARCHAR(<1000)), but this syntax doesn't exist yet (AFAIK. And I don't believe it will come).

What you want is similar to putting UTF8 path names into a LATIN1 column. It would work as long as you just store and retrieve the whole column or compare the whole column for (non-)equality. Most paths would fit as you suggested, but paths with too many multi-byte characters would be rejected. The whole thing would fail though, if you try to compare less or greater, ask for length in characters (not bytes), retrieve parts of the paths, or similar operations, which need to know exactly where a characters starts and stops or what its value is (for comparisons other than (non-)equality).

Regards
Ingo

Hi,

on current release on Debian Testing, this issue still happen. Regarding performance issue of long indexes, why not start by a configurable key length (if possible), with warning database designer about efficiency in such case, so if we accept lower performance for a "lazy" design of table(s)/index(es), it makes everyone happy.

Regards.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/40433

ChangeSet@1.2587, 2007-12-27 20:42:32+01:00, istruewing@stella.local +9 -0
  Bug#4541 - "Specified key was too long;
              max key length is 1000 bytes" with utf8
  
  Not to be pushed. This patch is meant as a courtesy to the
  community. It raises the maximum key length to 4005 bytes
  or 1335 UTF-8 characters. This is the maximum raise possible
  without changing other parameters. Patch for MySQL 5.0.
  
  This patch will not go into the source repositories because
  it results in bigger buffers and as such increases memory
  footprint and decreases performance. I do not know if the
  performance hit is measurable, but it exists in theory. This
  is enough to keep it out of the standard source.
  
  The planned solution for the problem is to add hash indexes
  to MyISAM. This would allow for arbitrary long keys. For a
  couple of reasons, I cannot expand on at the moment, I guess
  that this will never be implemented for MyISAM. This is my
  personal estimate. Do not take it as official statement.
  
  Anyway, this patch does now allow projects or users, who build
  from source, to work around the problem reported in bug #4541.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/40434

ChangeSet@1.2650, 2007-12-27 20:43:47+01:00, istruewing@stella.local +10 -0
  Bug#4541 - "Specified key was too long;
              max key length is 1000 bytes" with utf8
  
  Not to be pushed. This patch is meant as a courtesy to the
  community. It raises the maximum key length to 4005 bytes
  or 1335 UTF-8 characters. This is the maximum raise possible
  without changing other parameters. Patch for MySQL 5.1.
  
  This patch will not go into the source repositories because
  it results in bigger buffers and as such increases memory
  footprint and decreases performance. I do not know if the
  performance hit is measurable, but it exists in theory. This
  is enough to keep it out of the standard source.
  
  The planned solution for the problem is to add hash indexes
  to MyISAM. This would allow for arbitrary long keys. For a
  couple of reasons, I cannot expand on at the moment, I guess
  that this will never be implemented for MyISAM. This is my
  personal estimate. Do not take it as official statement.
  
  Anyway, this patch does now allow projects or users, who build
  from source, to work around the problem reported in bug #4541.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/40435

ChangeSet@1.2777, 2007-12-27 20:44:46+01:00, istruewing@stella.local +13 -0
  Bug#4541 - "Specified key was too long;
              max key length is 1000 bytes" with utf8
  
  Not to be pushed. This patch is meant as a courtesy to the
  community. It raises the maximum key length to 4005 bytes
  or 1001 UTF-8 characters. This is the maximum raise possible
  without changing other parameters. Patch for MySQL 6.0.
  
  Note that since 6.0 UTF-8 characters can take up to 4 bytes.
  
  This patch will not go into the source repositories because
  it results in bigger buffers and as such increases memory
  footprint and decreases performance. I do not know if the
  performance hit is measurable, but it exists in theory. This
  is enough to keep it out of the standard source.
  
  The planned solution for the problem is to add hash indexes
  to MyISAM. This would allow for arbitrary long keys. For a
  couple of reasons, I cannot expand on at the moment, I guess
  that this will never be implemented for MyISAM. This is my
  personal estimate. Do not take it as official statement.
  
  Anyway, this patch does now allow projects or users, who build
  from source, to work around the problem reported in bug #4541.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/40436

ChangeSet@1.2703, 2007-12-27 20:49:27+01:00, istruewing@stella.local +8 -0
  Bug#4541 - "Specified key was too long;
              max key length is 1000 bytes" with utf8
  
  Not to be pushed. This patch is meant as a courtesy to the
  community. It raises the maximum key length to 4005 bytes
  or 1335 UTF-8 characters. This is the maximum raise possible
  without changing other parameters. Patch for MySQL 4.1.
  
  This patch will not go into the source repositories because
  it results in bigger buffers and as such increases memory
  footprint and decreases performance. I do not know if the
  performance hit is measurable, but it exists in theory. This
  is enough to keep it out of the standard source.
  
  The planned solution for the problem is to add hash indexes
  to MyISAM. This would allow for arbitrary long keys. For a
  couple of reasons, I cannot expand on at the moment, I guess
  that this will never be implemented for MyISAM. This is my
  personal estimate. Do not take it as official statement.
  
  Anyway, this patch does now allow projects or users, who build
  from source, to work around the problem reported in bug #4541.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/40438

ChangeSet@1.2703, 2007-12-27 20:30:26+01:00, istruewing@stella.local +8 -0
  Bug#4541 - "Specified key was too long;
              max key length is 1000 bytes" with utf8
  
  Not to be pushed. This patch is meant as a courtesy to the
  community. It raises the maximum key length to 4005 bytes
  or 1335 UTF-8 characters. This is the maximum raise possible
  without changing other parameters. Patch for MySQL 4.1.
  
  This patch will not go into the source repositories because
  it results in bigger buffers and as such increases memory
  footprint and decreases performance. I do not know if the
  performance hit is measurable, but it exists in theory. This
  is enough to keep it out of the standard source.
  
  The planned solution for the problem is to add hash indexes
  to MyISAM. This would allow for arbitrary long keys. For a
  couple of reasons, I cannot expand on at the moment, I guess
  that this will never be implemented for MyISAM. This is my
  personal estimate. Do not take it as official statement.
  
  Anyway, this patch does now allow projects or users, who build
  from source, to work around the problem reported in bug #4541.

Thanks Ingo for the patch, but using the patch for MySQL 5.0 with 5.0.51a results in warnings about MI_MAX_POSSIBLE_KEY defined twice in include/myisam.h, lines 44 and 72.

You are right. My fault. I didn't watch the build log carefully enough. The problem might have propagated to the higher versions too. I'll check and supply new patches.

For now you can fix it by removing the later definition (here line 72).

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/41449

ChangeSet@1.2569, 2008-01-30 18:26:58+01:00, istruewing@stella.local +9 -0
  Bug#4541 - "Specified key was too long;
            max key length is 1000 bytes" with utf8
  
  Not to be pushed. This patch is meant as a courtesy to the
  community. It raises the maximum key length to 4005 bytes
  or 1335 UTF-8 characters. This is the maximum raise possible
  without changing other parameters. Patch for MySQL 5.0.
  
  This patch will not go into the source repositories because
  it results in bigger buffers and as such increases memory
  footprint and decreases performance. I do not know if the
  performance hit is measurable, but it exists in theory. This
  is enough to keep it out of the standard source.
  
  The planned solution for the problem is to add hash indexes
  to MyISAM. This would allow for arbitrary long keys. For a
  couple of reasons, I cannot expand on at the moment, I guess
  that this will never be implemented for MyISAM. This is my
  personal estimate. Do not take it as official statement.
  
  Anyway, this patch does now allow projects or users, who build
  from source, to work around the problem reported in bug #4541.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/41450

ChangeSet@1.2515, 2008-01-30 18:29:21+01:00, istruewing@stella.local +10 -0
  Bug#4541 - "Specified key was too long;
            max key length is 1000 bytes" with utf8
  
  Not to be pushed. This patch is meant as a courtesy to the
  community. It raises the maximum key length to 4005 bytes
  or 1335 UTF-8 characters. This is the maximum raise possible
  without changing other parameters. Patch for MySQL 5.1.
  
  This patch will not go into the source repositories because
  it results in bigger buffers and as such increases memory
  footprint and decreases performance. I do not know if the
  performance hit is measurable, but it exists in theory. This
  is enough to keep it out of the standard source.
  
  The planned solution for the problem is to add hash indexes
  to MyISAM. This would allow for arbitrary long keys. For a
  couple of reasons, I cannot expand on at the moment, I guess
  that this will never be implemented for MyISAM. This is my
  personal estimate. Do not take it as official statement.
  
  Anyway, this patch does now allow projects or users, who build
  from source, to work around the problem reported in bug #4541.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/41451

ChangeSet@1.2525, 2008-01-30 18:31:51+01:00, istruewing@stella.local +13 -0
  Bug#4541 - "Specified key was too long;
            max key length is 1000 bytes" with utf8
  
  Not to be pushed. This patch is meant as a courtesy to the
  community. It raises the maximum key length to 4005 bytes
  or 1001 UTF-8 characters. This is the maximum raise possible
  without changing other parameters. Patch for MySQL 6.0.
  
  Note that since 6.0 UTF-8 characters can take up to 4 bytes.
  
  This patch will not go into the source repositories because
  it results in bigger buffers and as such increases memory
  footprint and decreases performance. I do not know if the
  performance hit is measurable, but it exists in theory. This
  is enough to keep it out of the standard source.
  
  The planned solution for the problem is to add hash indexes
  to MyISAM. This would allow for arbitrary long keys. For a
  couple of reasons, I cannot expand on at the moment, I guess
  that this will never be implemented for MyISAM. This is my
  personal estimate. Do not take it as official statement.
  
  Anyway, this patch does now allow projects or users, who build
  from source, to work around the problem reported in bug #4541.

Does this fetch affect InnoDB limitation of 767 characters as well? If not, can one be introduced (for the same reasons)?

Sorry, I meant to say "patch".

Sorry, I am not going to modify InnoDB, let alone in 4.1.

However in 5.0 we raised the overall MAX_KEY_LENGTH to 3k. Since it was not for MyISAM, it might be for InnoDB. So your wishes migth be satisfied in >= 5.0 already.

I wonder if the bug has been fixed. If it was fixed, which mysql version is it? 

We are using 
Server version: 5.0.52-enterprise-log MySQL Enterprise Server (Commercial)

Here is my test case test.sql:

DROP DATABASE IF EXISTS logdb1_0;

CREATE DATABASE logdb1_0;

GRANT ALL on logdb1_0.* to logdb1_0 identified by "passwd";

USE logdb1_0;

CREATE TABLE access_daily (
  row_id bigint(20) unsigned NOT NULL auto_increment,
  time datetime default NULL,
  host varchar(255) default NULL,
  instance varchar(32) default NULL,
  servicename varchar(32) default NULL,
  duration float(12,6) unsigned default 0,
  rule varchar(32) default NULL,
  msg_count int(10) unsigned default NULL,
  route varchar(64) default '',
  protocol varchar(32) default '',
  PRIMARY KEY  (row_id),
  UNIQUE KEY c_key1 (time,route,protocol,host,instance,servicename,rule)
) TYPE=InnoDB DEFAULT CHARSET=utf8,MAX_ROWS=1000000000,AVG_ROW_LENGTH =150;

When I changed my tables' charset to utf8, my compound index generated errors:

Welcome to the MySQL monitor.  Commands end with ; or \g.

Your MySQL connection id is 9651

Server version: 5.0.52-enterprise-log MySQL Enterprise Server (Commercial)

 

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

 

mysql> \. test.sql

Query OK, 1 row affected (0.00 sec)
Query OK, 1 row affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)

Database changed
ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes

For MYISAM:
Welcome to the MySQL monitor.  Commands end with ; or \g.

Your MySQL connection id is 9515

Server version: 5.0.52-enterprise-log MySQL Enterprise Server (Commercial)
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql> \. test.sql
Query OK, 0 rows affected (0.00 sec)
Query OK, 1 row affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Database changed
ERROR 1071 (42000): Specified key was too long; max key length is 1000 bytes

This is really bothering us. I have to reduce some columns (in the compound index) which is not practical or I have to restructure the tables to normalize some columns to other tables.

Thanks

Michael Meng

The bug is not fixed. I published a couple of patches that can be used by those who build from source. These patches will not make it into the mainstream version because they increase the memory footprint of MyISAM even when short keys are used only. The plan for fixing it finally is to implement HASH indexes one day.

Hello !

I really do not understand how this one can take 4 years. I tried to setup sqlgrey while meeting this - yes, bug ! You can not restrict keys by size bitwise as keys are supposed to be a set of CHARACTERS. Why not even consider letting the user decide what is the best solution for him/her ?

We are in 2008 and i really not want to be forced to use ASC II just to get primary keys of existing apps to work.

You've got my sorry for this english.

Bye and thanks for your attention.

I think this is solved in 
5.0.51b-community-nt MySQL Community Edition (GPL)

My Tables are InnoDB as well as MyISIAM. No problems on creating manually as well as importing dumps.

5.0.51b does not appear to have the fix; it produced the same error.

When is this going to become more than just a patch? Will it be incorporated into the existing 5.0 or 5.1 codebase anytime soon? 

The two choices that we saw were:
- resize our field, truncating down to 255  OR
- change the unique index to just a regular index, and add logic to the app code to enforce uniqueness. 

We opted for the 2nd option.

No. The patch will not be incorporated into any MySQL release.

In 5.1 you could have UTF-8 index length of 333 (3 bytes per character).
In 6.0 you could have UTF-8 index length of 250 (4 bytes per character).

Bug#12756 was marked a duplicate of this bug.

Bug #17042 Index limits too small to be useful in multibyte environments was marked a duplicate of this bug.

Bug#41125 was marked as a duplicate of this bug.

----
 ALTER TABLE `sites` ADD UNIQUE (`Url`); 
------
mysql  Ver 14.12 Distrib 5.0.45

Please let us know if unique key length was posted yet in any later version?

Yep, this is a must! This is a really annoying bug!
You've got my vote :)

OMG, 5 years bug?

Here is:
* 5.1.25-rc-log (CoolStack package)
* Solaris 10 x86

ENGINE=InnoDB CHARSET=utf8

I have to set a key prefix with the same length of the field, which doesn't make sense.

Yesterday I made a dump of my 100 Go database (which took all day). Then I ran the import of the dump on some other computer before the night, and this morning I can see "ERROR 1071 (42000) at line 47947: Specified key was too long; max key length is 1000 bytes".

Feature request ? Oh yeah I would like to have this feature:
 $ mysqldump > dump.sql
 $ mysql < dump.sql
That would be a great feature ! I can wait 5 years of course because it's not really some important feature. I can't believe that the severity of this bug is S4 only.

This bug is not a feature request, it's just a severe bug.

I have no idea of what I can do to restore my database. I deleted the original database, all I have is the dump file.

Which MySQL version created the dump file?
Which MySQL version are you loading the data into?

Thank you, Alexander Barkov, for replying.

The version is 5.0.24a-Debian_3-log

I get the error even if I try to restore the database on the same server that dumped it.

Can you please paste "SHOW CREATE dbname" output here,
where dbname is the name of the database you load into?

Also, do CREATE TABLE statements in the dump file have 
the CHARSET clause? If yes, which character set is given
in these statements?

mysql> SHOW CREATE DATABASE RM;
+----------+-------------------------------------------------------------+
| Database | Create Database                                             |
+----------+-------------------------------------------------------------+
| RM       | CREATE DATABASE `RM` /*!40100 DEFAULT CHARACTER SET utf8 */ |
+----------+-------------------------------------------------------------+
1 row in set (0.00 sec)

I found the statement that gives the error:

DROP TABLE IF EXISTS `s_moteur`;
CREATE TABLE `s_moteur` (
  `moteur_id` smallint(6) unsigned NOT NULL auto_increment,
  `nom` varchar(30) collate utf8_unicode_ci NOT NULL default '',
  `nom_id` varchar(30) collate utf8_unicode_ci NOT NULL default '?',
  `description` varchar(255) collate utf8_unicode_ci NOT NULL default '',
  `host` varchar(40) collate utf8_unicode_ci NOT NULL default '',
  `host_header` varchar(40) collate utf8_unicode_ci NOT NULL default '',
  `options` varchar(255) collate utf8_unicode_ci default NULL,
  `cache_id` tinyint(3) unsigned NOT NULL default '0',
  PRIMARY KEY  (`moteur_id`),
  UNIQUE KEY `nom_id` (`nom_id`),
  UNIQUE KEY `host` (`host`,`host_header`,`options`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci COMMENT='';

and I found out that replacing the definition of the "options" field by

  `options` varchar(200) collate utf8_unicode_ci default NULL,

solves the problem.

Anyway, thank you for your help.

As I understand, this is actually not a bug, but a known limitation with MyISAM tables. To avoid it, you can use InnoDB instead.

Ronny, from what I remember about this bug, you understand incorrectly. Both InnoDB and MyISAM have this problem.

Has this bug actually been fixed?  I am having the same problem creating a table after making some modifications with regards to keys:

7889 [main] ERROR org.hibernate.tool.hbm2ddl.SchemaExport - Unsuccessful:
create table.....

7889 [main] ERROR org.hibernate.tool.hbm2ddl.SchemaExport - Specified key was
too long; max key length is 1000 bytes

Is there anyway around this?  I'm considering changes my database to a characterset of latin1, but if i'm not mistaken, i lose the ability to store certain characters, e.g. the EURO symbol (among others)

------------------------------------
TO ANYONE WHO RUNS INTO THIS PROBLEM
------------------------------------

Before trying to work around this problem, consider what this error is telling you: you are trying to set up a key (i.e., a tree searching index) that needs more than a certain number of bytes. If you are getting this error for something that you've marked as a primary key and your keylength is over 20 bytes (latin1) or 60 bytes (utf8), then stop right and go back to your design - this is not a suitable primary key.

If you are running into this problem for a non-primary key or an indexed column, the problem comes down to the same thing: you're trying to use an enormously long string to do tree index lookup, which means you've probably not understood how keys are used in database management systems.

----------------------------------
HOW TO PROPERLY SOLVE THIS PROBLEM
----------------------------------

- if this is for a primary key, redesign your table. It really is this simple: if you're trying to do lookups based on very long keys, your table design is bad. Consider this an attack on your database designing skills if you must, but search keys should be short.

- if this is for a non-primary key, redesign your table. Either this is not a key, and you should not term it as such, or it is, and it should be a hell of a lot shorter.

- if this is for an index (INDEX or FULL TEXT), consider what an "index" means: you are saying that your entire column should act as a search key. While you may want to take advantage of this functionality because it lets you do natural text searches in your databases, this is *not* what MySQL is for. If you need full text indexing for vast data (say your columns represent full article texts) then you should not be using MySQL for this in the first place, but text indexers like Apache's "Lucene" or the like.

-------------------------------
WHAT MAKES THIS WARNING SO BAD?
-------------------------------

Tree indices let MySQL speed up the process of finding data in your table. Keys and index/full text indices speed up the search process by building a separate data structure, with the keys ordered in a way that allows quick searching.

These structures are sorted structures, which means that whatever you called a key or an index is stored sorted (typically) alphanumerically, regardless of whether that's how you've put it in your own table. Being guaranteed sorted helps mysql to very quickly find these keys. However, this quickly finding keys is only possible if the keys are short.

Think of it this way - when you look for a word "cat" in an ordered list of three letter words, you're going to first find the place where words start with "c", then after that, words with second letter "a", and then words with third letter "t".

This is fast. MySQL will be able to find you the right row(s) of data in your original table lightning fast... unless you told it to build an index or to key on something huge.

If you're using a key or index of type varchar(255), for instance, you're telling MySQL that it may have to do 255 byte comparisons before it can even get to the list of matching rows in your original table. This completely destroys mysql's -or any database management system for that matter- ability to quickly get you the data you need. In fact, it will probably be slower than doing a full table scan of your original data, because when you're telling mysql that it needs to do this:

select * from mytable where mykeycolum like 'my huge selection criterium';

where "mykeycolumn" is keyed/indexed, what you're actually telling MySQL to do is this:

rowids = collapse (select rowids from mytable_mykeycolumn_index where mykeycolumn like 'my huge selection criterium');
select * form mytable where mykeycolumn in rowids;

-----------------------------------------------
IF YOU THINK YOU NEED MORE BYTES IN KEYS ANYWAY
-----------------------------------------------

It is possible that you're just using MySQL for something simple, and either don't have the time or the skill to work out a good table layout to get MySQL to swiftly do what you would like. This is entirely possible, and it is understandable that you might run into this problem, but consider that this error is not thrown just because the size limit was arbitrarily chosen by the developers, and you "just need a few bytes more".

This error is an indicator that you're trying to make MySQL do something that you really shouldn't want to make it do. Now is a great time to sit down for half a day to a day and learn a bit about how to properly design tables, and what you should and shouldn't expect a database management system to do.

-------------
IN CONCLUSION
-------------

Q - Why am I getting this warning?
A - Because you're using a construction intended to speed up the query process in a way that will likely slow things down considerably, instead.

Q - How do solve this?
A - Optimise your tables so that you key on short keys, and do full text searching using full text indexers (do not rely on a dababase management system to do full text searching for you. In order to be efficent, dbms don't know what "languages" are, they just know byte sequences and if you're lucky, collation rules).

Q - Will this bug be fixed?
A - Pray it doesn't; if anything, it should be tightened to not allow keys over 128 bytes for latin1, or 384 for utf8.

The reason for this is that the "specified key was too long" warning is the best indicator that the database designer does not understand the purpose of table keys and indices, and needs to smarten up, because he or she's trying to use speed-up techniques in a way that'll slow mysql down, instead.

@Jason Urrich

> "... if anything, it should be tightened to not allow keys over 128 bytes for latin1, or 384 for utf8."

It's true that long keys are inefficient computer-resource-wise compared to short keys.

Consider:
CPU is fast.
RAM is big.
Hardware is cheap.

Having long keys can be more efficient, in a wider sense, than forcing programmers to use a short key application design, when programmer development or porting time is considered.

For many purposes, supporting larger keys might add a fraction of a CPU second to key lookups, but save (cumulatively) hours or days of application development or porting effort.

An example is a URL columns.
URLs out in the wild can be very long - 4KB+.
Defining a column "url varchar(20000) unique" is clear and maintainable.
Not the most efficient design for use of server resources - absolutely.

Adding a url fingerprint column (e.g. an MD5 hash) to lookup URLs and enforce uniqueness is definitely a superior design for performance, but it adds time to implement and complexity to maintain.

Most likely, the extra key lookup time for this big key would even be noticed. Especially in batch applications.

So, I'd argue it would be useful and "efficient" (in a wider sense) for MySQL to offer large keys. I guess that's why other DB vendors do offer larger keys.

Another possibility (noted earlier by another commenter) is to offer hash indexes to implement unique indexes for on-disk tables. This would be sufficient for some cases and would be both computer-efficient and programmer-efficient.

Just want to add in response to the last couple of comments that speed is not the only reason indexes are used. In the case of unique indexes, we want to guarantee that no two rows will have the same value for the indexed column(s). The unique constraint is typically more important than performance.

Although checking for uniqueness can be done on the application side, you would have to rely on EVERY application performing this check, and doing so without any bugs or race conditions in each implementation. I would not have as much confidence in such a system as one in which the constraint check is done centrally by the database.

I like the idea of using an MD5 hash over all the indexed columns for the internal implementation of a long unique index, with a few caveats:

The hash itself must be non-unique, because an MD5 collision can theoretically be caused by different key values. If an index lookup on insert finds a matching MD5 hash, the database should then do a row lookup to check whether the full values of the column(s) match. This should be a relatively inexpensive and infrequent operation.

The key value(s) must be case-folded prior to generating the hash if the column collation is case-insensitive, so that the database will be able to find identical values that differ only in case.

Range lookups are not possible on an index of hashed values, so the database should create a partial key index in parallel with the hash key index for that purpose.

For non-unique indexes, using an MD5 hash for the index is not necessary. Such indexes can be restrained to using a partial key. I would add that any length restriction on non-unique indexes can be handled transparently -- i.e. don't report an error if a user creates a index on a VARCHAR(4000) column, just implicitly index the first 1000 (or 128, or whatever) characters. The application won't know the difference.

I suspect this bug is going to receive a lot more traffic as people upgrade their servers to MySQL 6.x (or 5.5?) where the maximum length of an indexed UTF-8 column is suddenly reduced from 333 to 250.

Well, I'd like to use an application white a database to import.
The database can be loaded because of this limitation.
I am a user and not a developper.

What solution could you suggest me to make my mysql database accept the schema creation requests I have to load? :-/

I believe good practice is being confused here with a MySQL bug.

Good practice: keep your keys under 1k bytes. All fine by me.

BUT.

There *is* a bug in MySQL. VARCHAR(255) means a variable fields of up to 255 CHARACTERS length, not 255 bytes. A char may be multibyte and a lot of programs break when converting from latin1 to utf8.

This is a serious bug in a world of internationalization not a "feature request". All the pragmatic good practice help yourself preaching above can go somewhere else, no offense.

It worked when changed setting on my.ini file to default to default-storage-engine=INNODB

See bug #58187

Seven years later, it's pretty clear MySQL AB or Oracle or whoever is simply not going to reasonably address this issue.  I've come into the fray with thousands of tables across dozens of servers, with an already big cleanup due in converting stuff from MySQL to MongoDB (which enforces UTF8 input).  The state of things is pretty sad and doesn't look good going forward, but, seriously, what did we all expect?

One bandaid is that the contributed length of string keys can be limited in the index definition.  That's what the "size" field is for on the index field names - you know, the field you always left blank (and that always defaulted to longest string value of the indexed field, as long as it was under 1k chars).

Long keys are no doubt awful for performance and increase size of everything.

However, the reality is many, many people index on, say, ARTICLE TITLE, or SUBJECT, and these indices, when converted from latin1_swedish_ci to utf8, all break, leading a lot of hurried people here to this bug / issue report.  And what we all find here is an arrogant answer, an arrogant stance, and really no real solution except cutting back what worked with lots more manual labor.  ...kinda sorta expected, so many years ago, that there would eventually be some big problem as soon as I read (and was totally confused by) "latin1_swedish_ci"...funny how in so many things there is only time to learn when we absolutely must.

I tend to agree about the arrogance of programmers who work on database engines, considering themselves DBAs. In fact the posting almost a year ago bout this topic by someone who clearly has never designed real world databases would never argue that performance must triumph over data integrity or rational design. It is not rational for multiple clients to arbitrate over the uniqueness of a key. Keys are never about performance, unless all you do is program a database engine. I don't care how many failsafes are in the code. Code is NOT the answer to ensuring unique keys.

Keys are about data integrity. I realize that MySql is not other databases, but the list of professional databases which do not have this bug (yes bug) is long. Why? Because these companies realized that professional DBAs live in the real world.

Since Oracle has purchases this product, I don't anticipate much movement here BUT you will NEVER see a limitation like this from the flagship OracleDB and a limitation of 1000 bytes is arbitrary and ridiculous.

It is the job of the database management system to hash, sort, blend and group data for fast retrieval, but that is what a b-tree, hash or other algorithms are designed to speed up. Table scan in any proper index is a simple bug, especially on a write with a unique constraint. For shame.

As for programmers who can only envision one solution to > 1000 byte indexes, they need to be fired. Lack of creativity is no excuse for allowing a bug like this to exist in a platform as old as this.

If you need this you should really look at MySQL 5.5 and the innodb_large_prefix option that is available from 5.5.14 (July 2011) onwards because it probably does what you are looking for:

"Enable this option to allow index key prefixes longer than 767 bytes (up to 3072 bytes), for InnoDB tables that use the DYNAMIC  and COMPRESSED  row formats. (Creating such tables also requires the option values innodb_file_format=barracuda  and innodb_file_per_table=true.) See Section 13.3.15, “Limits on InnoDB Tables” for the relevant maximums associated with index key prefixes under various settings.

For tables using the REDUNDANT and COMPACT row formats, this option does not affect the allowed key prefix length. It does introduce a new error possibility. When this setting is enabled, attempting to create an index prefix with a key length greater than 3072 for a REDUNDANT or COMPACT table causes an error ER_INDEX_COLUMN_TOO_LONG (1727)."

http://dev.mysql.com/doc/refman/5.5/en/innodb-parameters.html#sysvar_innodb_large_prefix

The people who have commented so far who were MySQL, Sun and now Oracle employees at the time they posted their comments are:

Sergei Golubchik
Ingo Struewing
Konstantin Osipov
Alexander Barkov
James Day

Comments from others do not represent the view of Oracle or any of its employees. Just other people sharing their opinions, as people are free to do in a feature request.

Oracle does recognise that there are sound design reasons sometimes for the tradeoff between large keys and efficiency of key storage. They are often not optimal for performance but they can still be appropriate and even the best choice for performance sometimes, perhaps in a high cardinality covering index.

James Day, MySQL Senior Principal Support Engineer, Oracle