MyISAM RTREE indexes ==================== For an explanation of the RTREE algorithms, see http://www.fmi.uni-passau.de/~neumann/proseminar/proseminar.pdf To be specific, we used Guttman's algorithm. RTREE index block layout ======================== There are two types of index blocks, branches and leaves. The highest bit of the first byte in an index block determines the block type. Branch: block[0] & 0x80 is set. Leave: block[0] & 0x80 is clear. The first two bytes of a block determine the number of bytes with valid data in this block. It is stored big-endian (high byte first). The highest bit needs to be masked out because it determines the block type as described above. The block length includes the first two bytes. The following macros are used to extract this information: mi_uint2korr(block) block[0] << 8 + block[1] Big-endian 2-byte value mi_getint(block) mi_uint2korr(block) & 0x7f Mask high bit rt_PAGE_END(block) block + mi_getint(block) End of valid data mi_test_if_nod(block) block[0] & 0x80 ? krlen : 0 If branch or leaf (*) (*) mi_test_if_nod() returns zero for a leaf block and the non-zero 'key_reflength' for a branch block. The 'key_reflength' is the number of bytes used for the index block positions. This could also be referred to as 'myisam_index_pointer_size'. Compare the description of 'myisam_data_pointer_size' in the manual: http://dev.mysql.com/doc/refman/5.1/en/full-table.html . RTREE indexes have a fixed key_length. A typical code fragment for stepping through an rtree index block is like so: nod_flag = mi_test_if_nod(page_buf); k = rt_PAGE_FIRST_KEY(page_buf, nod_flag); last = rt_PAGE_END(page_buf); for (; k < last; k = rt_PAGE_NEXT_KEY(k, key_length, nod_flag)) {} rt_PAGE_FIRST_KEY(page, nod_flag) is defined as: (page + 2 + nod_flag) rt_PAGE_NEXT_KEY(key, key_length, nod_flag) is defined like: (key + key_length + (nod_flag ? nod_flag : rec_reflength)) 'rec_reflength' is the 'myisam_data_pointer_size'. Branch blocks ============= The keys of branch blocks determine multi-dimensional minimum boundary rectangles (MBR) that cover all rectangles of the child nodes. Branch blocks contain 2-byte block length a sequence of (position{key_reflength}, key{key_length}) Position is a value that has to be multiplied with MI_MIN_KEY_BLOCK_LENGTH to get the offset in the index file. All index blocks start at a multiple of MI_MIN_KEY_BLOCK_LENGTH. This is the minimum index block size. Index blocks can be 1, 2, 4, 8, or 16 times that size. Each index has a fixed index block size, but different indexes of the same table can have different block sizes. _mi_kpos(key_reflength, after_position) extracts the position from a key and converts it into a file position (it multiplies with MI_MIN_KEY_BLOCK_LENGTH). 'after_position' is the byte in the block buffer behind the position value. For MyISAM rtrees this is the first byte of the key value. [Note: This differs from MyISAM btree indexes, where the position follows the key value.] Leaf blocks =========== The keys of leaf blocks determine multi-dimensional minimum boundary rectangles (MBR) that cover the object of the data row. Leaf blocks contain 2-byte block length a sequence of (key{key_length}, position{rec_reflength}) Note: In contrast to branch blocks, the record positions follow the key value. Level ===== Root is assigned level 0. The level below root is 1. The lower the level, the higher its number. This determination is a problem. When the tree height changes, the level of all blocks changes. One could see this as an academic problem. We do not store the level with the blocks. So they are of temporary meaning only. They support some algorithms. However the level numbers are stored with block locations in the re-insertion list during key deletion. When A block underflows, it is cut from the tree and its keys are re-inserted later. The underflow can happen with leaf blocks as well as with branch blocks. For leaf keys it would be ok to inserts them like keys are normally entered into the index. We wouldn't need to keep track of the level for them. They would travel the tree down to the leaf level anyway. When a branch node is cut from the tree, all its referenced blocks or even sub-trees are implicitly cut from the tree. Since these sub-trees are ok, we can add them back as they are. This means that we just add back all keys from the cut branch block. But in this case we must add them back at the same level as before. All leaf blocks shall be on the same level. The implemented algorithm of re-insertion is so that we re-insert all keys at their original level. We do not explicitly distinct between branch keys and leaf keys. Now the problem. If the height of the tree changes during deletion and re-insertion, the stored level numbers become invalid (Bug#25673 - spatial index corruption, error 126 incorrect key file for table). If we would have counted from the leaf level, this problem would not exist. However we would either need to store the current tree height of every index or evaluate it at open. Fix for Bug#25673 ================= (spatial index corruption, error 126 incorrect key file for table) If a split of the root tree happens during re-insertion we need to increment the levels of all pages in the reinsert list that have not yet been used up. Tree height does not shrink =========================== Note that we do not reduce the height of an rtree when there is only one key left in the root block. With every key removed that causes underflow of the level below root, the block below root will be cut and re-inserted. Re-insertion implementation =========================== We take the blocks from the re-insertion list in the order in which they were entered. Block underflow starts on leaf level, and possibly escalates up the tree. On every higher level, we cut a bigger piece from the tree. When we start re-inserting the leaf block, the tree may have become much smaller. Many leaves might be cut away and wait for re-insertion. Even worse, the cut away sub-trees are "nearer" to the cut away leaf in terms of spatial coordinates than the remaining part of the tree. When we re-insert the keys from the underflown leaf first, it is more unlikely that good keys are available on higher levels. More probable is that we need to extend existing keys to cover areas that are covered by the cut away subtrees already. So in my humble opinion it would make more sense to take blocks from the re-insertion list in reverse order.