Bug #54924 make innodb startup when compiled with 8kb pages
Submitted: 30 Jun 2010 21:51 Modified: 2 Jul 2010 17:44
Reporter: Mark Callaghan Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: InnoDB Plugin storage engine Severity:S3 (Non-critical)
Version:5.1.47 OS:Any
Assigned to: Inaam Rana CPU Architecture:Any
Tags: 8kb, Contribution, innodb, page

[30 Jun 2010 21:51] Mark Callaghan
Description:
Innodb can be compiled for 8kb pages but there are a few assumptions in the code that require a 16kb page or Innodb will not start.

Without this change the allocation done from ha_create when
buf_pool_init calls it is attempted using buffer pool pages and that
function attempts to recursively lock the buffer pool mutex. It is
done from the buffer pool because len is not less than 1/2 of a page
in mem_heap_create_block. For 16kb pages, this allocation was less
than 1/2 of a page and ended up use MEM_HEAP_DYNAMIC.

Also change check for max log file size to work for any page size.

How to repeat:
Build with 8kb pages (modify storage/innodb_plugin/include/univ.i) and try to run a test.

Suggested fix:

diff --git a/storage/innodb_plugin/buf/buf0buf.c b/storage/innodb_plugin/buf/buf0buf.c
index c32681d..c321d7e 100644
--- a/storage/innodb_plugin/buf/buf0buf.c
+++ b/storage/innodb_plugin/buf/buf0buf.c
@@ -1029,7 +1029,7 @@ buf_pool_init(void)
        buf_pool->curr_size = chunk->size;
        srv_buf_pool_curr_size = buf_pool->curr_size * UNIV_PAGE_SIZE;

-       buf_pool->page_hash = ha_create(2 * buf_pool->curr_size,
+       buf_pool->page_hash = ha_create_dynamic(2 * buf_pool->curr_size,
                                        256, SYNC_BUF_PAGE_HASH);
        buf_pool->zip_hash = hash_create(2 * buf_pool->curr_size);

diff --git a/storage/innodb_plugin/ha/ha0ha.c b/storage/innodb_plugin/ha/ha0ha.c
index f049fca..9b02c2c 100644
--- a/storage/innodb_plugin/ha/ha0ha.c
+++ b/storage/innodb_plugin/ha/ha0ha.c
@@ -49,8 +49,9 @@ ha_create_func(
        ulint   mutex_level,    /*!< in: level of the mutexes in the latching
                                order: this is used in the debug version */
 #endif /* UNIV_SYNC_DEBUG */
-       ulint   n_mutexes)      /*!< in: number of mutexes to protect the
+       ulint   n_mutexes,      /*!< in: number of mutexes to protect the
                                hash table: must be a power of 2, or 0 */
+       ibool   dynamic)        /*!< in: use MEM_HEAP_DYNAMIC */
 {
        hash_table_t*   table;
 #ifndef UNIV_HOTBACKUP
@@ -64,8 +65,13 @@ ha_create_func(
        but in practise it never should in this case, hence the asserts. */

        if (n_mutexes == 0) {
-               table->heap = mem_heap_create_in_btr_search(
-                       ut_min(4096, MEM_MAX_ALLOC_IN_BUF));
+               if (dynamic) {
+                       table->heap = mem_heap_create(
+                               ut_min(4096, MEM_MAX_ALLOC_IN_BUF));
+               } else {
+                       table->heap = mem_heap_create_in_btr_search(
+                               ut_min(4096, MEM_MAX_ALLOC_IN_BUF));
+               }
                ut_a(table->heap);

                return(table);
@@ -77,7 +83,11 @@ ha_create_func(
        table->heaps = mem_alloc(n_mutexes * sizeof(void*));

        for (i = 0; i < n_mutexes; i++) {
-               table->heaps[i] = mem_heap_create_in_btr_search(4096);
+               if (dynamic) {
+                       table->heaps[i] = mem_heap_create(4096);
+               } else {
+                       table->heaps[i] = mem_heap_create_in_btr_search(4096);
+               }
                ut_a(table->heaps[i]);
        }
 #endif /* !UNIV_HOTBACKUP */

diff --git a/storage/innodb_plugin/include/ha0ha.h b/storage/innodb_plugin/include/ha0ha.h
index 78dca89..75ba4ce 100644
--- a/storage/innodb_plugin/include/ha0ha.h
+++ b/storage/innodb_plugin/include/ha0ha.h
@@ -92,8 +92,9 @@ ha_create_func(
        ulint   mutex_level,    /*!< in: level of the mutexes in the latching
                                order: this is used in the debug version */
 #endif /* UNIV_SYNC_DEBUG */
-       ulint   n_mutexes);     /*!< in: number of mutexes to protect the
+       ulint   n_mutexes,      /*!< in: number of mutexes to protect the
                                hash table: must be a power of 2, or 0 */
+       ibool   dynamic);       /*!< in: use MEM_HEAP_DYNAMIC */
 #ifdef UNIV_SYNC_DEBUG
 /** Creates a hash table.
 @return                own: created table
@@ -102,7 +103,8 @@ chosen to be a slightly bigger prime number.
 @param level   in: level of the mutexes in the latching order
 @param n_m     in: number of mutexes to protect the hash table;
                must be a power of 2, or 0 */
-# define ha_create(n_c,n_m,level) ha_create_func(n_c,level,n_m)
+# define ha_create(n_c,n_m,level) ha_create_func(n_c,level,n_m,FALSE)
+# define ha_create_dynamic(n_c,n_m,level) ha_create_func(n_c,level,n_m,TRUE)
 #else /* UNIV_SYNC_DEBUG */
 /** Creates a hash table.
 @return                own: created table
@@ -111,7 +113,8 @@ chosen to be a slightly bigger prime number.
 @param level   in: level of the mutexes in the latching order
 @param n_m     in: number of mutexes to protect the hash table;
                must be a power of 2, or 0 */
-# define ha_create(n_c,n_m,level) ha_create_func(n_c,n_m)
+# define ha_create(n_c,n_m,level) ha_create_func(n_c,n_m,FALSE)
+# define ha_create_dyanmic(n_c,n_m,level) ha_create_func(n_c,n_m,TRUE)
 #endif /* UNIV_SYNC_DEBUG */

 /*************************************************************//**

diff --git a/storage/innodb_plugin/srv/srv0start.c b/storage/innodb_plugin/srv/srv0start.c
index 9ce5d04..355fd38 100644
--- a/storage/innodb_plugin/srv/srv0start.c
+++ b/storage/innodb_plugin/srv/srv0start.c
@@ -1337,7 +1337,7 @@ innobase_start_or_create_for_mysql(void)
        }
 #endif /* UNIV_LOG_ARCHIVE */

-       if (srv_n_log_files * srv_log_file_size >= 262144) {
+       if (srv_n_log_files * srv_log_file_size >= ((1LL <<32) / UNIV_PAGE_SIZE)) {
                fprintf(stderr,
                        "InnoDB: Error: combined size of log files"
                        " must be < 4 GB\n");
[30 Jun 2010 22:40] James Day
Mark, you already know this, but for others:

http://dev.mysql.com/doc/refman/5.1/en/innodb-restrictions.html

---

The default database page size in InnoDB is 16KB. By recompiling the code, you can set it to values ranging from 8KB to 64KB. You must update the values of UNIV_PAGE_SIZE and UNIV_PAGE_SIZE_SHIFT in the univ.i source file.
Note

Changing the page size is not a supported operation and there is no guarantee that InnoDB will function normally with a page size other than 16KB. Problems compiling or running InnoDB may occur. In particular, ROW_FORMAT=COMPRESSED in the InnoDB Plugin assumes that the page size is at most 16KB and uses 14-bit pointers.

A version of InnoDB built for one page size cannot use data files or log files from a version built for a different page size.

---

Fixes for issues found when trying other page sizes may well be welcomed by the developers.

James Day, MySQL Senior Support Engineer, Oracle
[1 Jul 2010 6:42] Valeriy Kravchuk
Thank you for the bug report and patch contributed.

I've set UNIV_PAGE_SIZE_SHIFT to 13 instead of 14 in current mysql-5.1 from bzr:

...
/* The 2-logarithm of UNIV_PAGE_SIZE: */
#define UNIV_PAGE_SIZE_SHIFT    13
/* The universal page size of the database */
#define UNIV_PAGE_SIZE          (1 << UNIV_PAGE_SIZE_SHIFT)
...

built it on Mac OS X using BUILD/compile-pentium-debug-max (successfully) and run tests from innodb suite with plugin:

valeriy-kravchuks-macbook-pro:mysql-test openxs$ ./mtr --mysqld=--ignore-builtin-innodb --mysqld=--plugin-load=innodb=ha_innodb_plugin.so --suite=innodb --force

I've got several failures:

The servers were restarted 13 times
Spent 47.691 of 128 seconds executing testcases

Check of testcase failed for: innodb.innodb-autoinc

Completed: Failed 5/26 tests, 80.77% were successful.

Failing test(s): innodb.innodb innodb.innodb_misc1 innodb.innodb_bug44369 innodb.innodb_bug46000 innodb.innodb_trx_weight

It is not obvious to me how/if these failures are related to 8k pages. 

Do you have any specific test in mind? Did I miss anything in my actions above?
[1 Jul 2010 22:19] Mark Callaghan
You tested more than I did. I confirmed that some of the tests passed and then used it for sysbench benchmarks on a server that had PCI-based flash. I know there were several failures. For now I would be happy if they changed the code as I suggested so that InnoDB is able to initialize.
[2 Jul 2010 15:34] Mikhail Izioumtchenko
FYI 'to really support' is a serious feature request as the mtr results show.
It would probably be more testing than coding. It would also
create a continuing load on further development and testing as different
page sizes will have to be considered and retested.
The current level of support is at best 'works for some internal testing'.
I used smaller page sizes for some stress testing some time ago,
UNIV_PAGE_SIZE_SHIFT was one result of that activity. Strangely
I don't remember the startup problem Mark reports so this could be 
a recently introduced problem. The overall impression was that it sort of worked
for the task at hand, testing the delete buffering in what is now 5.5.
But I never went as far as to test subtle things like redo size limit.
BTW if indeed the limit is in pages and the 262K constant should depend
on the page size then the 4G constant in the error message becomes variable
as well.
However 'just taking in' the proposed patch after careful review 
without changing the documented support level for different page sizes
may be OK given the usual considerations for GA releases are met.
[2 Jul 2010 15:50] Mark Callaghan
I changed the title to match my request. I want InnoDB to initialize when 8kb pages are used.
[2 Jul 2010 16:33] Mikhail Izioumtchenko
I can start mysqld with 8k pages, current 5.1 innodb code.
The mtr results in the bug also mean the server is able to start.
Valeriy, could you figure out exactly what my.cnf settings would
prevent the startup?
[2 Jul 2010 16:34] Mark Callaghan
Are you using the plugin or builtin?
My changes are required for the plugin.
[2 Jul 2010 16:38] Mikhail Izioumtchenko
it was the plugin

InnoDB: !!!!!!!! UNIV_LOG_LSN_DEBUG switched on !!!!!!!!!
InnoDB: !!!!!!!! UNIV_MEM_DEBUG switched on !!!!!!!!!
InnoDB: The InnoDB memory heap is disabled
InnoDB: Mutexes and rw_locks use GCC atomic builtins
InnoDB: Compressed tables use zlib 1.2.3 with validation
100702  9:37:19  InnoDB: highest supported file format is Barracuda.
100702  9:37:23 InnoDB Plugin 1.0.10 started; log sequence number 49183
[2 Jul 2010 16:41] Mark Callaghan
64 bit or 32 bit?
Plugin on 64-bit failed for me with 8kb pages.
[2 Jul 2010 16:49] Mikhail Izioumtchenko
64 bit Linux.
[2 Jul 2010 16:54] Valeriy Kravchuk
In my case it was InnoDB Plugin in current 5.1.49 from bzr, 32-bit build on Mac OS X:

valeriy-kravchuks-macbook-pro:5.1 openxs$ file ../5.1-8k/libexec/mysqld 
../5.1-8k/libexec/mysqld: Mach-O executable i386

I did not have any explicit my.cnf used, and the only mtr options used are listed. Based on tests executed successfully (and test failures in main suite, when I've got just different results because of different page size) I am sure that InnoDB Plugin was able to startup, many times :)

Maybe 64-bit matters...
[2 Jul 2010 17:26] Mark Callaghan
Now I know why it fails for me. We applied the split hash mutex change from InnoDB. This is not in official 5.1 and it initializes a hash table early in InnoDB startup -- which is why I had to change the code.

So, when Innodb eventually includes that change they will need my patch.
[2 Jul 2010 17:32] Valeriy Kravchuk
Confirmed that 64-bit does not matter:

valeriy-kravchuks-macbook-pro:5.1-8k openxs$ bin/mysqld_safe --ignore-builtin-innodb --plugin-load=innodb=ha_innodb_plugin.so &
[1] 34135
valeriy-kravchuks-macbook-pro:5.1-8k openxs$ 100702 20:28:14 mysqld_safe Logging to '/Users/openxs/dbs/5.1-8k/var/macbook-pro.err'.
chown: /Users/openxs/dbs/5.1-8k/var/macbook-pro.err: Operation not permitted
100702 20:28:14 mysqld_safe Starting mysqld daemon with databases from /Users/openxs/dbs/5.1-8k/var

valeriy-kravchuks-macbook-pro:5.1-8k openxs$ bin/mysql -uroot testWelcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.1.49-debug Source distribution

Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.
This software comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to modify and redistribute it under the GPL v2 license

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show plugins;
+------------+----------+----------------+---------------------+---------+
| Name       | Status   | Type           | Library             | License |
+------------+----------+----------------+---------------------+---------+
| binlog     | ACTIVE   | STORAGE ENGINE | NULL                | GPL     |
| partition  | ACTIVE   | STORAGE ENGINE | NULL                | GPL     |
| ARCHIVE    | ACTIVE   | STORAGE ENGINE | NULL                | GPL     |
| BLACKHOLE  | ACTIVE   | STORAGE ENGINE | NULL                | GPL     |
| CSV        | ACTIVE   | STORAGE ENGINE | NULL                | GPL     |
| FEDERATED  | DISABLED | STORAGE ENGINE | NULL                | GPL     |
| MEMORY     | ACTIVE   | STORAGE ENGINE | NULL                | GPL     |
| MyISAM     | ACTIVE   | STORAGE ENGINE | NULL                | GPL     |
| MRG_MYISAM | ACTIVE   | STORAGE ENGINE | NULL                | GPL     |
| ndbcluster | DISABLED | STORAGE ENGINE | NULL                | GPL     |
| InnoDB     | ACTIVE   | STORAGE ENGINE | ha_innodb_plugin.so | GPL     |
+------------+----------+----------------+---------------------+---------+
11 rows in set (0.00 sec)

mysql> create table t1(c1 int primary key) engine=InnoDB;
Query OK, 0 rows affected (0.41 sec)

mysql> insert into t1 values(1),(2);
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> show table status like 't1';
+------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time | Check_time | Collation         | Checksum | Create_options | Comment |
+------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-------------------+----------+----------------+---------+
| t1   | InnoDB |      10 | Compact    |    2 |           4096 |        8192 |               0 |            0 |   4194304 |           NULL | 2010-07-02 20:28:50 | NULL        | NULL       | latin1_swedish_ci |     NULL |                |         |
+------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-------------------+----------+----------------+---------+
1 row in set (0.00 sec)

mysql> exit
Bye
valeriy-kravchuks-macbook-pro:5.1-8k openxs$ tail var/macbook-pro.err 
InnoDB: Setting log file ./ib_logfile1 size to 5 MB
InnoDB: Database physically writes the file full: wait...
InnoDB: Doublewrite buffer not found: creating new
InnoDB: Doublewrite buffer created
InnoDB: Creating foreign key constraint system tables
InnoDB: Foreign key constraint system tables created
100702 20:28:15 InnoDB Plugin 1.0.9 started; log sequence number 0
100702 20:28:15 [Note] Event Scheduler: Loaded 0 events
100702 20:28:15 [Note] /Users/openxs/dbs/5.1-8k/libexec/mysqld: ready for connections.
Version: '5.1.49-debug'  socket: '/tmp/mysql.sock'  port: 3306  Source distribution
valeriy-kravchuks-macbook-pro:5.1-8k openxs$ file libexec/mysqld 
libexec/mysqld: Mach-O 64-bit executable x86_64

I wonder what shell we do next: verify this as a feature request or just leave it open...
[2 Jul 2010 17:33] Mark Callaghan
Let the InnoDB team know about the issue with the page hash mutex change and then close this.
[2 Jul 2010 17:44] Mikhail Izioumtchenko
per Mark's latest comment closing as not a bug.
Assigning to Inaam who is the page hash patch owner, I'll also let him know
by other means. 
Inaam, the log file size fix is independent and should be considered.
If 4g is indeed N pages and not 4g bytes, then we could fix it,
including message text fix and x/UNIV_PAGE_SIZE is better written 
as x>>UNIV_PAGE_SIZE_SHIFT.