Bug #46972 Procedure analyse does not work as doucmented with (var)char columns
Submitted: 28 Aug 2009 11:56 Modified: 23 Oct 2009 13:21
Reporter: Peter Laursen (Basic Quality Contributor) Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: General Severity:S2 (Serious)
Version:5.0, 5.1 - probably any OS:Any
Assigned to: CPU Architecture:Any
Tags: qc

[28 Aug 2009 11:56] Peter Laursen
Description:
http://dev.mysql.com/doc/refman/5.0/en/procedure-analyse.html

"if there are more than max_elements distinct values, then ENUM is not a suggested type."

However with PROCEDURE ANALYSE(n) ENUM is still suggested even if there are more distinct values for char/varchar types.

How to repeat:
CREATE TABLE `t2` (
  `utf8_general_ci` INT(11) DEFAULT NULL,
  `c` CHAR(10) DEFAULT NULL,
  `v` VARCHAR(20) DEFAULT NULL,
  KEY `c` (`c`),
  KEY `v` (`v`)
) ENGINE=INNODB DEFAULT CHARSET=utf8;

-- next populate table so that

SELECT * FROM t2;
/* RETURNS 

    ID  c       v     
------  ------  ------
     1  abc     def   
     2  abcd    def   
     3  abcde   defg  
     4  aaaa    bbbb  
     1  abc     def   
     2  abcd    def   
     3  abcde   defg  
     4  aaaa    bbbb  
     5  asas    asas  
     6  (NULL)  (NULL)
     7  (NULL)  (NULL)
*/

SELECT * FROM t2 PROCEDURE ANALYSE(1);
/* (partial output). This is incorrect as the (1)-filter should prevent proposal of ENUMs for both `c`and `v` columns

Field_name  Optimal_fieldtype                       
-----------------------------------------------------
test.t2.ID  TINYINT(1) UNSIGNED NOT NULL            
test.t2.c   ENUM('aaaa','abc','abcd','abcde','asas')
test.t2.v   ENUM('asas','bbbb','def','defg') 
*/

SELECT * FROM t2 PROCEDURE ANALYSE(10);
/* (partial output). Exactly same as above. This is correct as both `c`and `v` columns have less than 10 distinct values.

Field_name  Optimal_fieldtype                       
-----------------------------------------------------
test.t2.ID  TINYINT(1) UNSIGNED NOT NULL            
test.t2.c   ENUM('aaaa','abc','abcd','abcde','asas')
test.t2.v   ENUM('asas','bbbb','def','defg') 
*/

Suggested fix:
Make it work for (var)chars as documented too. 

I mark as 'S2' - not because PROCEDURE ANALYSE is a vital feature, but PROCEDURE ANALYSE is rather useless with this bug.
[28 Aug 2009 12:06] Peter Laursen
However I find when using 2nd parameter

SELECT * FROM t2 PROCEDURE ANALYSE(1,10); -- ENUMs not proposed
SELECT * FROM t2 PROCEDURE ANALYSE(1,10000); -- ENUMS proposed

The 2nd parameter is described like this ".. is the maximum amount of memory that ANALYSE() should allocate per column while trying to find all distinct values." 

This does not explain the different behaviour of the two statements above.
[28 Aug 2009 12:09] Peter Laursen
Sorry for one copy/paste mistake in the create statement!

First coloumn of the table is not named `utf8_general_ci` but `ID`
[28 Aug 2009 14:57] Valeriy Kravchuk
Thank you for the bug report. Verified just as described with recent 5.1.39 from bzr on Mac OS X. Workaround is to try different values for parameters of PROCEDURE ANALYZE(), as suggested in the manual:

"You may need to try different settings for the arguments so that PROCEDURE ANALYSE()  does not suggest the ENUM data type when it is not appropriate."
[28 Aug 2009 15:16] Peter Laursen
The two statements:
1: "if there are more than max_elements distinct values, then ENUM is not a suggested type."
2: "You may need to try different settings for the arguments so that PROCEDURE ANALYSE() 
does not suggest the ENUM data type when it is not appropriate."

.. are simply in conflict! If 1) is true then 2) is not necessary.  It really looks like this was reproted before (internally or not) and somebody thought that just adding such stupid note (2) to documentation solves anything.

This is completely crazy!
[23 Oct 2009 12:03] Peter Laursen
A colleague pointed my attention to:
http://bugs.mysql.com/bug.php?id=44060

.. duplicates?
[23 Oct 2009 13:21] Valeriy Kravchuk
Duplicate of Bug #44060 it seems.