Critique of 8.0 documentation on Partitioning

This is in response to http://forums.mysql.com/read.php?106,647595,650604
and is aimed at Jon Stephens and Dan Price.

-- Background.
I am reviewing 8.0 docs on partitioning with two things in mind:
* As a Technical editor - looking for ways to improve the phrasing, etc.;
* As a frequent Forum answerer - A common question (though not phrased this way) is "why is partitioning not solving my performance problems".  Of necessity, the docs are a "reference manual".  However, I encourage the inclusion of tidbits of info that border on "best practice".

-- Overall impressions

The overall structure, inter-page links, etc, seems to be adequate.  PARTITIONing is spread across a lot of pages, but there is a lot of information to present.

The information, explanations, and other verbiage is reasonably good.  (And have probably been incrementally improved since partitioning was introduced.)  The examples are somewhat "non-real-world", but the message seems to need such.  There are a few things that have not yet been updated for 8.0 -- especially MyISAM references (except for saying 'not supported').

I'm worried that "native partitions" has more impact than is visible in the pages.  Alas, I am not an expert on that new feature, so I can't be more specific.

-- Specific comments, by page...

-- Chapter 20 Partitioning (8.0) -  http://dev.mysql.com/doc/refman/8.0/en/partitioning.html

"but NDB is not included in MySQL 8.0" -- I understand the technical distinction, but this sounds like abandonment of NDB.  Suggest rephrasing.  (Some of the later pages say it better.)

It would be good to jump to a "quickstart" wherein a few _practical_ uses of Partitioning are presented, complete with examples.  Personally, I would give only one example - time series data that drops and repartitions daily to maintain 30 days' worth of data.  (Other examples are either obscure or don't provide any performance benefit.)  Perhaps this first page could include the example.

Since users adopt partitioning under the illusion that it is a panacea for performance, this page, and the pages that follow, should have more explicit information on when performance is, or is not, improved by partitioning.  Of necessity, all examples have very few rows.  But partitioning is even less likely to benefit performance on tiny tables -- The advice of "use only on large tables (or tables that become large)" should be given.

-- 20.1 Overview - http://dev.mysql.com/doc/refman/8.0/en/partitioning-overview.html

"different portions of a table are stored as separate tables in different locations" -- Why encourage an old fashioned concept when Raid striping is almost universally better.

"This function must be nonconstant and nonrandom." -- Perhaps use "deterministic" here?  Also, this paragraph is vague on whether a Stored Function can be used to compute the 'number of the partition'.  Please clarify on this page.  (I see it on 20.6, but that is not where the user might think to look.)

In 2's complement arithmetic, "-MAXVALUE" is not the correct value.  Nor is it correct for UNSIGNED.  Suggest rewriting the discussion of the range for 'intval'.  Perhaps change "-MAXVALUE" to "MINVALUE" and then explain MINVALUE.

Is this MAXVALUE consistent with its usage in PARTITION BY RANGE?  I think not, since the latter says "LESS THAN MAXVALUE", while the former says "<=".  Start over; the formula for intavl cannot be salvaged.

Thanks for the paragraph on horizontal vs vertical partitioning.

The list of storage engines that do not support partitioning is missing MEMORY.

-- 20.3 Management - http://dev.mysql.com/doc/refman/8.0/en/partitioning-management.html

20.3.1 (perhaps elsewhere) Examples show "ENGINE = MyISAM" -- needs updating for 8.0.  9.2.1.6-ICP is another page.

20.3.1 Please clarify that doing ADD PARTITION on a BY RANGE table is not possible if there is a LESS THAN MAXVALUE partition.

Discuss the pros and cons of including a "LESS THAN MAXVALUE" partition.  Pro: covers all data.  Con: disallows ADD PARTITION, but so what - use REORGANIZE PARTITION.

I would especially appreciate the DROP and REORGANIZE to handle the sliding time BY RANGE I mentioned before.

(LIST, KEY, HASH, SUBPARTITION - not reviewed; I feel they are virtually worthless.)

-- 20.3.3 Exchanging partitions - http://dev.mysql.com/doc/refman/8.0/en/partitioning-management-exchange.html

Please start with case(s) where EXCHANGE PARTITION might be useful.

The restrictions on `p` and `nt` should explictitly say "same engine".  (That admonition does show up, but only near the end of the long page.)

I have yet to get a clear picture of the performance of EXCHANGE when using "native partitioning".  EXCHANGE used to be a very fast operation because no data needed to be shoveled around.  I am worried that 'native' and 'tablespaces' may mess with that.  Please comment on whether/when EXCHANGE is a 'meta' operation versus a data-shoveling operation.  (This also comes up on page 20.6, again without satisfactory verbiage.)

"Any AUTO_INCREMENT columns in the exchanged table are reset."  Please clarify.  Does this imply modifying every row?  I can envision an implementation where `p` preserves the ids when it becomes a 'table', but `nt` must be renumbered when it joins the partitioned table.  Corollary:  If `nt` starts empty, no massive update is needed.  This seems like the main "use case" for EXCHANGE.

REPAIR TABLE is mentioned (at least twice) as a way to fix "non-matching rows" after an EXCHANGE.  I suspect this was wrong with MyISAM, and useless for InnoDB.

MyISAM in the examples on this page, too.  Suggest you check all 8.0 Partition pages.

-- 20.3.4 Maintenance - http://dev.mysql.com/doc/refman/8.0/en/partitioning-maintenance.html

In "Repairing partitions", it mentions "all rows that cannot be moved due" but does not explain what "move" is occuring.  Please rephrase.  Perhaps something like "Rows with duplicated keys are deleted, but you can't predict which of a dup pair will be deleted."

-- 20.6 Restrictions - http://dev.mysql.com/doc/refman/8.0/en/partitioning-limitations.html

"buffer uses 130 KB memory per partition" -- Is there a my.cnf setting for this?  (If so, please add it.)  Is this a silent way for Partitioning to blow out RAM?  (If so, I would appreciate to know more.)

"so using more partitions does not automatically lead to better results" -- Thank you for including this.

REPAIR TABLE is mentioned twice on this page.  (See also page 14.1.7.1)

-- 20.6.1 Unique - http://dev.mysql.com/doc/refman/8.0/en/partitioning-limitations-partitioning-keys-unique-keys.html

"every unique key on the table must use every column..." -- descriptive, but it fails to point out the ramifications.  To elaborate...

It should be noted that when you have to add a column to a UNIQUE key, you don't lose "uniqueness", but you do lose the "uniqueness constraint" on the original column(s).  At that point, you may as well make it a plain INDEX.

It might be worth noting that "global indexes" are in the pipeline; they would provide a true implementation of UNIQUE and probably FOREIGN KEY.  I bring this up partially because of DBAs coming from competing products may need to have this deficiency pointed out.  Even if you don't say "in the pipeline", please say "not available".

Users tend to assume that an AUTO_INCREMENT `id` can only be used alone in PRIMARY KEY(id).  In reality, the only limitation is that "`id` be the _first_ column in _some_ index"; it need not be PRIMARY or UNIQUE.  Hence, it does not hurt to have to tack on the partition key to PRIMARY KEY(id).  (One small exception occurs if are explicitly setting `id` in an INSERT.  See Trey Raymond's comment)

Another corollary...  If you have an auto-inc `id`, plus want to cluster on some `other` key for efficient access, it is quite OK to
PRIMARY KEY(other, partition_key, id)
INDEX(id, partition_key)

Does the new 8.0 InnoDB feature about persistent auto_increment values affect this discussion of auto_increment in partitioning?  http://dev.mysql.com/doc/refman/8.0/en/mysql-nutshell.html#mysql-nutshell-additions and more details in http://dev.mysql.com/doc/refman/8.0/en/innodb-auto-increment-handling.html#innodb-auto-increment-initialization

Another thing to add is that InnoDB _really_ needs an _explicit_ PRIMARY KEY, especially in a replication environment, and especially with binlog_format=ROW.  This probably belongs in the middle of the page where it discusses "if a table has no unique keys".

(This may not fit.)  A "best practice" I have found is to put the partition key column(s) at the _end_ of any unique key.  After all, you are pruning on the partition key, why not then use some other column at the start of any index.  Corollary:  It is usually useless to partition by `id` -- Note that the last example on the page suggests such.

The comment by David Wilkins is somewhere between 'bad' and 'incorrect', as pointed out by Anne Mouse.  Can/should they be removed?

-- 22.14 I_S.Partitions - http://dev.mysql.com/doc/refman/8.0/en/partitions-table.html

Check the correctness of the default values for NODEGROUP and TABLESPACE_NAME.

-- My Partition blog

http://mysql.rjweb.org/doc.php/partitionmaint
It points out that time-series and Exchange partition are two of only four partition use cases that, according to me, provide any performance benefit.