1) The size of the open_cache is normally only decreased in 
open_table just before adding another entry into it. However if
all entries are in use and the unused_tables linked list is
empty then only adding happens, thus the open_cache always grows
to the biggest concurrent usage size and never shrinks from there.
Is this on purpose or by mistake??

2) unused_tables is a doubly linked list that contains all unused
table entries in the open_cache. This is a global variable and
protected by LOCK_open

3) The open_cache is a global hash table consisting of table entries
either in use or cached for future use. It is hashed using length
of database_name\0table_name\0 and this is the key. There can be
several entries with the same key in the open_cache.

4) Open temporary tables are kept only on the thread object. There
is a doubly linked list of them with thd->temporary_tables as first
pointer. The key in this list is the name of the temporary table
plus 4 bytes of server id and 4 bytes of pseudo_thread_id

5) refresh_version is a global variable used to trigger meta data
changes on a table. It is changed by FLUSH TABLE commands that
call close_cached_table. Each table object contains a
refresh_version to check if they need to reopen the table.
Also the thd object contains a refresh_version variable called
version. Thus to ensure that a specific table is no longer
used (after changing its definition in some or the other way, it
is necessary to use refresh_version). close_cached_tables does this
job for you using a TABLE_LIST and some other support variables. It
also awaits that all tables have been removed from the cache if
told to do so.

6) Description of open_table
Open table first looks for:
- Temporary tables (list on thd->temporary tables)
- Locked tables (list on thd->locked_tables)
  If the thread has one locked table it can ONLY use locked tables.

  In 5.0 this code has been expanded to avoid deadlock when using stored
  procedures since all tables possibly used in the stored procedure must be
  locked before starting the execution of the stored procedure.
  There is also a hack to handle opening of a view when there are locked
  tables in the thread.

Next it gets the LOCK_open (global lock)
Refresh check overall
---------------------
At first it checks whether refresh_version has been updated since
it started opening tables. If so it returns and restarts the
opening of the tables (when called from open_tables, the normal case).

This case is always checked except for the first table where rather the
thd->version is set. It is an important check for Meta Data updates but is
only executed when a general reset of all tables have been requested by a
FLUSH command.

The reason this code is needed is when there are no tables in the cache for
this table. In that case we still have to check for a general reload of all
tables. In that case we need to release the tables opened thus far. Thus
this check could be executed out of the normal path and moved to the else
part when the normal code doesn't find a cached table to use.

TODO RONM: Check this part of the code with Monty to see what could be
changed here.

Then it flushes any tables used by the handler interface if there are any
handler tables.

Loop over all tables in the cache with the same table name
----------------------------------------------------------
Then it goes on to check for opened tables in the cache that
are NOT IN USE. For each table found with the same name it checks whether
the table needs to be refreshed. This is necessary since the table can lie
around in the cache for a while after it was changed and it is important to
refresh it before using it if needed.

The normal path of the loop
---------------------------
When it finds a table not in need of refresh and not in use it exits the
loop and unlinks the table from the unused_tables list. It sets the table
in use by this thread releases the global mutex on LOCK_open.

Normal exit of the subroutine
-----------------------------
When called from open_tables (refresh != NULL) then it links the table into
the open_tables single linked list on the thread object. It initialises the
lock type to READ and performs some activities regarding table names, aliases
and finally it initialises a set of variables on the table object.

Another common path through the code
------------------------------------
Another common path in the code of the code is that there is no table found
with this name in the cache. In this we first limit the growth of the
table cache by deallocating the head of the unused_tables (which unfortunately
is also the Least Recently Used). I would vote for both a first and a last
reference here and to deallocate from last and put in last used first (or
the opposite naming).

Then the code goes on to open the frm file and create a new table object.
There is some code to handle view which needs some further understanding of
the view logic. It then sets up the shared object and inserts the new table
object into the table cache. Then it rejoins the normal exit of the
subroutine.

Handling refresh when discovered in the loop
--------------------------------------------
This code of outmost importance in understanding how to perform advanced
meta data updates. It is however very uncommon that it is used.

The code for refresh starts by closing all tables already opened that are
in need of refresh.
Before it proceeds it then waits for other threads to complete their use of
the found tables (if called from open_tables). Then it signals refresh to
open_tables (if called from there) and returns an unsuccess to indicate that
the normal path wasn't taken.

7) Lock Tables processing
(We currently skip handling of prelocking for stored procedures in this
description)
At first the set of real tables are derived from the table list (derived
tables, views and infoschema tables are not "real" tables).
The actual lock processing is performed by mysql_lock_tables.

Queries that use locked tables do not use mysql_lock_tables since they have
locks on all involved tables already before starting the query. This is
expalained in separate section on handling of prelocked statements.

Step 1 (in get_lock_data):
Calculate the number of locks and the number of tables involved in the lock
request. (Temporary tables are not locked and not counted).

Step 2 (in get_lock_data):
Allocate memory for locks and list of table references

Step 3 (in get_lock_data):
Get references to all involved tables lock by calling store_lock on each
table handler involved.

store_lock will actually also convert the lock to the appropriate level for
the handler involved.

Description of how various handlers change lock mode in store_lock
------------------------------------------------------------------
BDB will convert all write locks except TL_WRITE_ALLOW_READ and TL_WRITE_ONLY
to TL_WRITE_ALLOW_WRITE. When tables are reopened after refresh version has
been updated OR when LOCK TABLE processing is done then BDB will not change
the lock type.

The Federated handler does the same changes as BDB with the exceptions
1) If thd->tablespace_op is set no changes are made (This must be an
overprecaution since Federated handler cannot be involved in tablespace
operations and thus this variable can never be set.

2) It does also convert TL_READ_NO_INSERT to TL_READ (under the same condition
as BDB converted above) to allow for concurrent inserts into t2 in query
INSERT INTO t1 SELECT ... FROM t2; (This is most likely a bug)

InnoDB changes the locks as the Federated (or rather the opposite) with the
addition that it doesn't change locks for the SQL command TRUNCATE TABLE.
The reason for this is most likely that InnoDB's Consistent Read is based on
transaction handling and TRUNCATE TABLE probably does things on a lower level.

NDB converts all write locks (except TL_WRITE_ONLY) to TL_WRITE_ALLOW_READ
with the same exception as BDB on lock tables and reopen tables and it does
also change TL_READ_NO_INSERT to TL_READ (NDB does allow concurrent inserts
since no SQL statement is secure to be consistent anyways).
It is most likely a bug that NDB also allows concurrent updates while
performing ALTER TABLE (sets TL_WRITE_ALLOW_READ) since this will create
havoc in the new table after ALTER TABLE is completed (naturally this can
still happen from another MySQL server but can be avoided by NDB's use of
allowing only one server to be connected under a certain mode of operation.

The MyISAM Merge handler and the partition handler are a bit special here
since they use the store_lock lock_type derived from underlying handlers.
Currently both do it by calling store_lock on all underlying handlers.
The partition handler can optimise here by only calling underlying handlers
if they do some real work since it is not possible to access the underlying
handlers directly through SQL statements.

MyISAM and all other handlers in the tree set the lock type directly unless
they are told to ignore the command through TL_IGNORE and the lock_type
haven't already been set by a previous call (the same table can be present
in a query more than once, through different handlers that all reference the
same lock object).


Step 4 (in mysql_lock_tables):
Check if a global_read_lock is issued and we are to do a write operation and
we are not to ignore global read locks.

....

Step 5 (in lock_external):
External locks are acquired on all tables.
If lock mode is TL_READ OR TL_READ_WITH_SHARED_LOCKS OR TL_READ_HIGH_PRIORITY
OR TL_READ_NO_INSERT then external lock mode is F_RDLCK
for higher lock modes (>= TL_WRITE_ALLOW_WRITE) it is set to F_WRLCK.

Description of how various handlers use external_lock
-----------------------------------------------------
Most handlers doesn't support neither transactions nor external locking. For
those handlers external_lock is a no-op (simply return 0). The only handler
supporting external locks is the MyISAM handlers so this handler actually
does some external lock processing. All the transactional handlers (InnoDB,
BDB and NDB also takes advantage of the fact that external lock is called at
the start of all statements (except where tables are already locked by
lock tables or since we are in a stored procedure, in this case start_stmt
does the same job).

BDB checks whether a transaction object exists if not then it creates one.
external_lock is called at the start of processing either with F_RDLCK or
with F_WRLCK and it is also called at end of processing with F_UNLCK.

1) Begin processing
If the transaction has not been opened yet (no tables previously external
locked and no previous statement in transaction) then a transaction is
started (either a local BDB transaction or a transaction known on higher
level).

2) End processing
If last table is closed and there is statement level transactions and there
are no updates in the transaction then the transaction is committed.

InnoDB handles things very similar to BDB except in a more complicated manner
since there are many more features in InnoDB like consistent reads and more
checks for optimised lock processing.

NDB handles things also similar to BDB and InnoDB except that it also has
some work to ensure that its local meta data cache is consistent.

MyISAM and MyISAM Merge tables locks the database used by the table.

The partition handler simply calls external lock on all underlying table
handlers.

Step 6 (in thr_multi_lock):
This is where the actual lock processing takes place. There was a global lock
used around the table cache in open table but here is where the actual tables
gets locked by using the lock objects received in the store_lock calls.

The first step here is important to get a deadlock free lock code. The locks
on the tables are processed in a predefined order. This predefined order is
by first sorting on place of lock object in memory and second to place
locks for higher contention levels on the same table earlier in the process.
The last step is needed since the same table can be in the same query with
both read lock and write lock. Thus if write locking wasn't done before read
locking there would be a deadlock possibility since another query could come
in and also read_lock the table whereby the current query cannot proceed and
the possibility for a deadlock is there.

Step 7 (in thr_lock):
For each table in the query a lock is acquired.
At first the mutex on the lock object is acquired to ensure that only one
query at a time is attempting to lock the table. This mutex is on a table
basis and is thus not a global lock and should be ok from an SMP point of
view.

1) Handling of Read Locks
First we check for write locks
- Write Lock encountered
  If our thread is the lock owner
  OR
    Write lock type is
    (TL_WRITE_ALLOW_WRITE OR TL_WRITE_DELAYED)
  OR
    Write lock type is (TL_WRITE_CONCURRENT_INSERT OR TL_WRITE_ALLOW_READ)
    AND our lock mode is NOT TL_READ_NO_INSERT
  THEN
    The lock is granted and is put last in the queue of current read requests.
    Some lock statistics is updated
    
If no Read lock is granted and Write lock type is TL_WRITE_ONLY then the lock
request is aborted and not put into any queue.

Thus if lock is TL_WRITE_LOW_PRIORITY, TL_WRITE or TL_WRITE_ONLY then only
Read locks are allowed from the same thread. For TL_WRITE_DELAYED and
TL_WRITE_ALLOW_WRITE all Read locks are allowed for all threads.

- No write lock encountered
  If no write locks waiting
  OR
    Write locks waiting are not of TL_WRITE or TL_WRITE_ONLY
  OR
    Our lock type is of type TL_READ_HIGH_PRIORITY
  OR
    Our thread already owns a read lock on the table
  THEN
    Same handling to put it into the current read queue (code duplicated
    with only a debug comment differing

If none of these attempts to schedule the read immediately was successful we
call wait_for_lock with read_wait queue as input.

2) Handling Delayed Write Locks
If there is already an ongoing TL_WRITE_ONLY
THEN
  abort lock request
  If
    no write locks are currently being processed
    OR
    write locks exist but they are not TL_WRITE_ALLOW_READ
  AND
    No TL_WRITE_ALLOW_READ requests are in write wait queue
  AND
    There is either a Read lock or a Write lock currently being processed
  THEN
    Insert the request into the Write wait lock queue using the thread variable
    suspend as condition for the pthread_cond_wait call. After this we return
    from here without waiting for the lock since it is a WRITE DELAYED request.
   
3) Handling of TL_WRITE_CONCURRENT_INSERT (used by INSERTs with special flag
SQL_CONCURRENT allowing inserts to insert data at the end of the file even
when table is read locked). It is normally upgraded to TL_WRITE except
when

When this feature is activated there is a method that can be inserted by the
handler to either allow or disallow this feature at a specific point in time.
If no method is provided then this is always prohibited (check how this
integrates with partitioning).

  1) An option has been used to set it to TL_WRITE_LOW_PRIORITY instead
  2) When there is a check_status method then the TL_WRITE_CONCURRENT_INSERT
   flag is retained.

4) Finally for all Write locks except the WRITE DELAYED that got into the write
   wait queue above the following logic is processed with the lock type as it
   was or has been upgraded to.

  If there is a write lock request ongoing
  THEN
    If TL_WRITE_ONLY is ongoing the lock request is aborted.
    Otherwise
    If our thread is the owner of the write lock
    OR
      our lock type is TL_WRITE_ALLOW_WRITE 
      AND
      There is no write wait queue
      AND
      Write lock type is TL_WRITE_ALLOW_WRITE
    THEN
      We insert ourselves into the write queue update some statistics and
      get_status (if exists) and return to continue processing
    ELSE
      We put ourselves into the write lock wait queue by calling wait_for_lock

  There is no write lock request ongoing
  THEN
    If no write lock requests waiting (thus neither a write lock queue or a
       write lock wait queue
    THEN
      First if TL_WRITE_CONCURRENT_INSERT was retained call check_status to see
      if it really is to be retained (this calls check_status method which
      in MyISAM is mi_check_status method) or if it is to be upgraded as
      explained above.

      If no read lock queue
      OR
        Our lock type is not
          (TL_WRITE_LOW_PRIORITY OR TL_WRITE OR TL_WRITE_ONLY)
        AND
          Our lock type is not
            (TL_WRITE_CONCURRENT_INSERT OR TL_WRITE_ALLOW_WRITE)
          OR
          There are no ongoing read requests of type READ_NO_INSERT
      THEN
      (Thus TL_WRITE_ALLOW_READ and TL_WRITE_DELAYED is always allowed with
       read queue, but TL_WRITE_ALLOW_WRITE and TL_WRITE_CONCURRENT_INSERT is
       not allowed if there is a READ_NO_INSERT currently ongoing)
       
        Insert request into write queue update some statistics, get_status
        and return
    ELSE
      Wait in write lock wait queue by calling wait_for_lock

7.1) Handling of wait_for_lock calls from above
At first it is inserted into the lock wait queue (read_wait or write_wait)
Then current mutex and condition is set-up using the mutex on the table
and the condition in the thread-specific part. Then the condition is set on
the THR_LOCK_DATA object (which is normally a part of the handler object).

Finally one enters a loop where one checks for abort on a thread-specific
variable (declared as volatile).
pthread_cond_wait is called in the loop and there are two ways to leave
the loop. Either the cond variable on the handler object was changed or
the abort variable on the thread object was set. One reason for the loop
logic is to ensure that spurious wake-ups doesn't cause any harm so only
true wake-ups are serviced.

If there is a condition variable still set then this indicates an abort and
the request is still in the queue and should be removed from there. If no
condition is set and the lock type is TL_UNLOCK then the request is already
dequeued but the result is still to abort the request.

The normal case is that one returns with reference to condition variable
zero. Then some statistics is updated, get_status is called and the table
mutex is unlocked. Finally one removes the current_mutex and current condition
by zeroing them (done under protection of a thread specific mutex).

Corresponding unlock logic
--------------------------

In the same manner as tables are locked by mysql_lock_tables there are calls
to handle unlock of tables.
mysql_unlock_tables handles this.
It first calls thr_multi_unlock that unlocks the table locks held by the query.
It then calls unlock_external that calls external_lock with F_UNLCK on the same
handlers as got the first external_lock call. What the handlers do with
external_lock and F_UNLCK was described above. Handlers are not involved in
thr_multi_lock so this is pure MySQL Server logic.

thr_multi_unlock calls thr_unlock on all tables where lock is not TL_UNLOCK.

Handling of thr_unlock
----------------------
Step 1:
Acquire mutex on the table object

Step 2:
Release lock from wait queue on read or wait queue (some special logic to handle
TL_WRITE_DELAYED as well.

Step 3:
Call update status if an update_status method exists and lock type released is
on level TL_WRITE_CONCURRENT_INSERT, TL_WRITE_DELAYED, TL_WRITE_LOW_PRIORITY,
TL_WRITE or TL_WRITE_ONLY.

Step 4:
Decrement read_no_write_count if lock type was READ_NO_INSERT.

Step 5:
Set lock type to UNLOCK on THR_LOCK_DATA object

Step 6:
If there are still active write locks we don't do anything. 
If there are no read locks left either
THEN
  If waiting write lock requests exists
  AND
    The waiting write lock is not TL_WRITE_LOW_PRIORITY
    OR
    Waiting write lock is TL_WRITE_LOW_PRIORITY but No waiting read lock exist
    OR
    Waiting write lock is TL_WRITE_LOW_PRIORITY and Waiting read lock exists
    but Waiting read locks are not of the type
    TL_READ_HIGH_PRIORITY OR TL_READ_NO_INSERT
  THEN
    Check if we have had a long row of write locks. This means that waiting read
    requests have been livelocked and will therefore be allowed to execute now
    to avoid a further advancement of the livelock problem. The waiting read
    requests are started by calling free_all_read_locks.

    Next step is to start moving lock requests from write lock wait queue to
    write lock queue. This is done one item at a time.
    The normal upgrade TL_WRITE_CONCURRENT_INSERT is called only here it is
    upgraded immediately to TL_WRITE rather than giving it a chance to be
    upgraded to TL_WRITE_LOW_PRIORITY.
    Next the wake-up logic is used to set condition to zero and signal on the
    condition object.
    Loop is broken if a lock is found that has not lock type set to
    TL_WRITE_ALLOW_WRITE or the next lock in wait queue (if there is one)
    doesn't have lock type TL_WRITE_ALLOW_WRITE. There must be an object last
    in the queue which is not a real lock object that have lock type not
    equal to TL_WRITE_ALLOW_WRITE for this loop to not crash.

If the loop was broken out of with objects remaining in write wait queue then
if the lead object has lock type
TL_WRITE_LOW_PRIORITY OR TL_WRITE OR TL_WRITE_ONLY then no more requests are
started, mutex is unlocked and one returns from the method.

Now if there were no write locks, no read locks and we have started at least
one write request then we release all waiting
read locks by calling free_all_read_locks with a parameter set to true if
there are waiting write locks on level
TL_WRITE_CONCURRENT_INSERT OR TL_WRITE_ALLOW_WRITE
If there are no waiting read locks we exit and release mutex before leaving.

Now next step is to handle the case where there are no more write requests
ongoing but there are ongoing read requests.

If there are waiting write requests
AND
  waiting lock request is of of lock type
    TL_WRITE_ALLOW_WRITE OR TL_WRITE_ALLOW_READ OR TL_WRITE_CONCURRENT_INSERT
    OR TL_WRITE_DELAYED
  AND
    no ongoing requests with TL_READ_NO_INSERT set
    OR
    waiting lock type is of type TL_WRITE_ALLOW_READ OR TL_WRITE_DELAYED
  THEN
  Check if lock to be upgraded, if upgraded then start all waiting read
  requests but not any more write requests.
  If lock not upgraded 

8) Handling of lock tables for prelocked queries