Bug #78290 | Docs: MySQL 5.7 virtual columns and relation with auto_increment columns | ||
---|---|---|---|
Submitted: | 31 Aug 2015 13:28 | Modified: | 8 Sep 2015 12:51 |
Reporter: | Simon Mudd (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Documentation | Severity: | S3 (Non-critical) |
Version: | 5.7.8 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | 5.7.8, auto_increment, virtual |
[31 Aug 2015 13:28]
Simon Mudd
[31 Aug 2015 13:30]
Simon Mudd
Clarification: this is marked as a docs bug: I would really prefer that the limitation in 5.7.8 does not exist, but it's not currently 100% clear if this is intended behaviour and if so why.
[31 Aug 2015 14:23]
MySQL Verification Team
Thank you for the bug report. C:\dbs>c:\dbs\5.7\bin\mysql -uroot -p --port=3570 --prompt="mysql 5.7 > " Enter password: ****** Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 4 Server version: 5.7.9 Source distribution PULL 2015/08/14 Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql 5.7 > create database v; Query OK, 1 row affected (0.05 sec) mysql 5.7 > use v Database changed mysql 5.7 > create table v ( id int not null auto_increment primary key ) engine = innodb; Query OK, 0 rows affected (0.27 sec) mysql 5.7 > alter table v add v_col int as (id) virtual ; ERROR 3109 (HY000): Generated column 'v_col' cannot refer to auto-increment column. mysql 5.7 > insert into v ( id) values (null), (null), (null); Query OK, 3 rows affected (0.09 sec) Records: 3 Duplicates: 0 Warnings: 0 c:\mysql-5.7.7-rc-winx64>bin\mysql -uroot Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3 Server version: 5.7.7-rc MySQL Community Server (GPL) Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> create database v; Query OK, 1 row affected (0.02 sec) mysql> use v Database changed mysql> create table v ( id int not null auto_increment primary key ) engine = innodb; Query OK, 0 rows affected (0.23 sec) mysql> alter table v add v_col int as (id) virtual ; Query OK, 0 rows affected (0.59 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> insert into v ( id) values (null), (null), (null); Query OK, 3 rows affected (0.08 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> show create table v; +-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | v | CREATE TABLE `v` ( `id` int(11) NOT NULL AUTO_INCREMENT, `v_col` int(11) GENERATED ALWAYS AS (id) VIRTUAL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1 | +-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec)
[31 Aug 2015 14:37]
Stefan Hinz
Posted by developer: It's not clear whether it's intended behavior or a bug. If this is a bug, please fix. :-) If it's intended behavior, let us know, and we'll document it accordingly.
[3 Sep 2015 12:16]
Guilhem Bichot
This limitation is going to be documented in the manual. More info: REASON FOR THE LIMITATION: It's a necessary limitation; the order of ops is: fill_record() (sql layer) write_row() (engine-specific call). the generated column is calculated by update_generated_write_fields() called from fill_record(). Autoinc value is determined by storage engine in write_row() (because different engines have different ways to store autoinc global counter, and to retrieve it, with special concurrency measures). So, if generated column depends on autoinc value, it cannot be calculated as autoinc value isn't yet known. HISTORY OF HOW WE ADDED THIS LIMITATION: In 5.7.7 this was already forbidden: alter table v add s_col int as (id) stored; # note: STORED ERROR 3109 (HY000): Stored generated column 's_col' cannot refer auto-increment column. for the reasons explained above. But adding a virtual column was ok, because that column's value didn't have to be computed when inserting the row, as: a) the column's value wasn't physically stored in the row (it's virtual) b) it isn't physically stored in the index But in 5.7.8 (b) changes: the virtual column value, if indexed, is physically stored in the index, so its value is needed when inserting the row. So we hit the same problem as with STORED, and thus apply the same limitation. In the end, the limitation applies to stored and virtual. Last precision: in 5.7.7 virtual columns could not be indexed.
[3 Sep 2015 12:17]
Guilhem Bichot
@Simon: hello! could you please share some concrete use cases which need a generated column to depend on an autoincrement column?
[3 Sep 2015 18:15]
Simon Mudd
Hi Guilhem, I don't have any real use cases at the moment, so really if you want to keep this current behaviour making sure the documentation is clearer is fine. Just say that a virtual column can not depend on another column that has the auto_increment attribute. You don't say that now and it's a trivial documentation change. The only cases I can think of which may be to keep things simpler for the developer or DBA might be to want to generate a virtual column based on say the primary key, but format it specially and consistently without having to rely on the application to do that everywhere. e.g. transaction id 1234 is shown as 'TRX0001234'. The app doesn't need to format this specially and it will be shown consistently everywhere. That saves time. MySQL traditionally has not handled multi-column join conditions very well, though I think this has been largely solved now in 5.7, and maybe it might be convenient to have a virtual column generated based on 2 different columns and join based on this as the logic is then "simpler". It may not be quite as efficient but in the end virtual columns are there to "help out". This may be a reason as these auto-increment values may happen to be part of a larger key. Multi-source replication may trigger people to run several "clusters" of similar servers, each with their own id generation, but then replicate that to a central server where the data is consolidated. Here a cluster number may be included to the "unique on the cluster" ids so that these columns become unique on all clusters. You don't need to do this with virtual columns but any queries on systems built this way require multi-column join conditions whereas previously this worked in a much simpler way. The virtual column built this way would keep things simpler. If the complexity of enabling this now is too high, then maybe it's best to wait for 5.7 to go GA, and let this new functionality get some use. If others can give examples of where referencing the auto-increment column may be interesting then follow it up then. The behaviour has changed from 5.7.7 to 5.7.8 and the documentation does not fully reflect that. Fixing that is my main concern.
[4 Sep 2015 7:53]
Guilhem Bichot
Hello Simon, thanks for the detailed explanation. So we'll document the limitation ASAP and leave it as is in 5.7. Let's discuss the use cases you kindly provided. I'm just writing potential design ideas for the future. For the first example (formatted autoinc), if the format column TRX0001234 is not indexed and is virtual, its value isn't needed at insertion time, so it should be possible in theory to allow such column. MySQL could have a rule: "a non-indexed virtual column can reference autoinc"; it would need to handle CREATE INDEX: if creating an index on a non-indexed virtual column referencing autoinc then throw error. For the second example (multi-column join), I have no idea; such generated column will likely be indexed, so its value is needed at insertion time. For the third example (master-master), if I understood correctly, we have a central server's column replicating the autoinc columns of clusters; but the former shouldn't need to be autoinc as it gets its values from the clusters (the values are generated on the clusters); it would even be dangerous to have it autoinc as an accidental direct insertion of NULL could disturb replication. If that column is not autoinc, then it can be in the generated column (which would concatenate it with the cluster's number). This said, if we wanted to lift the limitation entirely, we would probably have to switch the order of calls; first generate the autoinc, then evaluate generated columns, then write in the index and the row. But as "generate the autoinc and write in the index and the row" is done in the storage engine (ha_innobase::write_row, for example), "evaluate generated columns" should be done as a callback, called by the storage engine's write_row, to get values from the server. And as "generate the autoinc" may take long-lasting locks, depending on: https://dev.mysql.com/doc/refman/5.7/en/innodb-auto-increment-configurable.html we must be cautious of anything we add to write_row() after "generate the autoinc". Food for thought for the future... :-)
[4 Sep 2015 9:39]
Simon Mudd
Thanks. I think we agree. * Things could be more flexible but that requires some thought and actual use cases will show that. * I understand the fact the storage engine generates the ids and this may be needed outside of the storage engine to complete the virtual column setup causes problems and needs to be well thought out. Some sort of callback I guess once the value is available. So needs to be done carefully. * For the auto-increment case on a slave if you use RBR this shouldn't be an issue as the row details are already pushed through but indeed if you "fan in" data from several sources into the same table then avoiding collisions is important. The traditional solution in MySQL to use autoincrement_increment and auto_increment_offset also seems rather weak and therefore using a different column to store the server/cluster reference helps but does require the primary key to be used at least on a central server to be on 2 columns (the auto-inc column + the server/cell column). Hence the potential attractiveness of a "single index" on a generated version of the 2 columns. Either way you've answered my question. What's not clear is whether I should open a separate FR for this even if it's not something that is likely to be looked at immediately? That would allow you to close this once bug report once the documentation is updated.
[8 Sep 2015 12:51]
Daniel Price
Posted by developer: The following limitation was added to the "CREATE TABLE and Generated Columns" documentation: https://dev.mysql.com/doc/refman/5.7/en/create-table.html "An AUTO_INCREMENT column cannot be used as a base column in a generated column definition." Thank you for the bug report.
[11 Sep 2015 13:45]
Guilhem Bichot
Hello Simon. Yes, a feature request to track this would be best. Thanks! Regarding "using a different column to store the server/cluster reference helps but does require the primary key to be used at least on a central server to be on 2 columns (the auto-inc column + the server/cell column)": when the data is consolidated on the central server: is it the central server which generates an autoinc value? - If yes, I understand the problem - if no, the column doesn't have to be autoinc (on the central server; it just replicates values generated earlier) and so can be in the generated column? Or is it that statement-based replication is used, so the column on the central server has to be autoinc even though values are provided by the replication stream?