Bug #57358 | Add option to not cluster the table around NOT NULL UNIQUE indexes automatically | ||
---|---|---|---|
Submitted: | 10 Oct 2010 22:26 | Modified: | 13 Oct 2010 16:28 |
Reporter: | Justin Swanhart | Email Updates: | |
Status: | Won't fix | Impact on me: | |
Category: | MySQL Server: InnoDB Plugin storage engine | Severity: | S4 (Feature request) |
Version: | Any | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | clustered index, innodb, null, primary key, unique key |
[10 Oct 2010 22:26]
Justin Swanhart
[11 Oct 2010 10:10]
Susanne Ebrecht
Hello Justin, 3NF says when you have a relationship between two columns then the referenced column needs to be defined unique. Which means that referenced column needs to have UNIQUE and NOT NULL (for defined) constrained. PRIMARY KEY constraint just is the combination of UNIQUE constraint + NOT NULL constraint. MySQL implemented PRIMARY KEY and UNIQUE KEY constraint as unique index and for PK constraints additonally with NOT NULL constraint. InnoDB just is SQL standard conform here and our constraints and index implementation is done similar way as you will find it in other DBMS too. The rule says that every table should have an identifier. The identifier should be in best way a natural identifier. If natural identifier don't exist then an artificial unique identifier should be created (e.g. personal number for employees). This artificial identifier should be as short as possible. I doubt that we will change the behaviour because it is totally sql standard and conform.
[11 Oct 2010 14:54]
MySQL Verification Team
should read http://dev.mysql.com/doc/refman/5.1/en/innodb-index-types.html I think reporter is saying that sometimes it is physically more efficient to store the table with the generated rowid 6-byte clustered index, than automatically cluster on the potentially lengthy user data..
[11 Oct 2010 15:19]
Justin Swanhart
What does any of that have to do with the clustered index? BNF does not dictate the structure of tables or indexes, only there relationships. InnoDB can enforce identifying constraints without organizing the table around them. On an InnoDB table, when there is a long clustered key, all of the secondary keys have to include the long clustered key. While it may be faster for unique key checks to use a clustered index, this is not necessary for functionality. Please don't mark this as won't fix as this is a valid feature request.
[11 Oct 2010 16:08]
Devananda van der Veen
Chiming in here, I agree with Justin that there is nothing in SQL specification which dictates that the physical structure of a table must be clustered by the PRIMARY KEY. The fact that InnoDB does this is engine-specific and can be very beneficial in some situations, but there are also cases where a long UNIQUE KEY makes for a terrible PRIMARY KEY. Here are two such reasons: * it makes secondary indexes much larger than necessary, which is less than optimal * it may cluster the records very poorly, for instance a random distribution, which can lead to frequent page splits and low page fill factor. It would be helpful to make this option at the table level, not globally.
[13 Oct 2010 16:28]
Justin Swanhart
InnoDB always organizes data (on disk) around the PRIMARY KEY, and failing that, around the first NOT NULL UNIQUE index on the table. There are times when this organization is undesirable, particularly when there is the need for an identifying constraint (PRIMARY KEY or UNIQUE NOT NULL) but this constraint is long (in terms of bytes). This long clustered key results in long secondary keys. In these cases I would like to be able to force an internal 6 byte key. I suggest: CREATE TABLE t1 ( c1 char(64) not null, c2 char(64) not null, c3 bigint not null, primary key (c1,c2), unique(c3) ) CLUSTERED INTERNALLY; and CREATE TABLE t1 ( c1 char(64) not null, c2 char(64) not null, c3 bigint not null, primary key (c1,c2), unique(c3) ) CLUSTERED USING c3; The default should be to cluster using the primary key, but the cluster index should be able to be chosen manually, as long as it meets the requirements for a clustered key.
[26 Nov 2011 1:18]
Andrew A
Hi, I've been developing an application that uses GUID.Comb (semi-sequential GUIDs) as the primary key, inserting into InnoDB tables. I can definitely see a valid use case for being able to tell InnoDB to please cluster internally, and not on the GUID PK. Insert performance into a multi-million row drops dramatically (see http://kccoder.com/mysql/uuid-vs-int-insert-performance/ - note this is not my research, I conducted my own to confirm myself and it agrees with these figures). The workarounds that I have available are: 1) Don't use GUIDs as PKs. I'd like to, because it lets prod/development data always be separate etc. 2) Drop the PRIMARY KEY, and instead use a UNIQUE NOT NULL index on my 'id' column. The minor downside is you can now insert nullable values into this field. 3) Add a special 'rowid' column that is simply a BIGINT AUTO_INCREMENT and mark this as the PK. This has a side effect of preventing data merging if it should ever be needed (since separate databases, such as prod/development, while having separate 'id' GUID rows, would have clashing 'rowid' values). Using option #2, I can see insert performance appears to be far better (similar to the 'uuid no key, indexed' value on the above page). There are also various articles such as on StackOverflow saying you shouldn't use GUIDs as PKs for InnoDB tables because of the clustering. Would you perhaps reconsider adding this feature request? Thanks, Andrew