Description:
The VIEW_CHANGE_LOG_EVENT, meaning that some member joined
the group, is logged on all members.
The VIEW_CHANGE_LOG_EVENT must be logged on the message stream
order, since it does mark the stream position on which the new
member joined.
On SECONDARY members this is achieved by queueing it in-order on
the applier queue, which already ensures the stream order.
On PRIMARY members, this is achieved by waiting that all precedent
local transactions are already committed, only then the
VIEW_CHANGE_LOG_EVENT is committed. No posterior transaction can
commit before the VIEW_CHANGE_LOG_EVENT since the plugin pipeline
will ensure that.
The wait for the local transactions commit is done on
handlers/certification_handler.cc
```
538 int Certification_handler::wait_for_local_transaction_execution()
539 {
...
565 if ((error= sql_command_interface->
566 wait_for_server_gtid_executed(local_gtid_certified_string,
567 GTID_WAIT_TIMEOUT)))
568 {
569 /* purecov: begin inspected */
570 if (error == 1) //timeout
571 {
572 log_message(MY_ERROR_LEVEL,
573 "Timeout when waiting for the server to execute local "
574 "transactions in order assure the group change proper logging");
575 }
576 else
577 {
578 log_message(MY_ERROR_LEVEL,
579 "Error when waiting for the server to execute local "
580 "transactions in order assure the group change proper logging");
581 }
582 /* purecov: end */
583 }
584 delete sql_command_interface;
585 DBUG_RETURN(error);
586 }
```
The GTID_WAIT_TIMEOUT is 30 seconds, which means, despite unlikely,
that if the already prepared and allowed to commit transactions take
more than 30 seconds to commit, the wait is interrupted and a error
is thrown.
The GTID_WAIT_TIMEOUT is 30 seconds, which means, despite unlikely,
that if the already prepared and allowed to commit transactions take
more than 30 seconds to commit, the wait is interrupted and a error
is thrown.
That error will make the VIEW_CHANGE_LOG_EVENT not to be logged:
```
522 //Assure the last known local transaction was already executed
523 error= wait_for_local_transaction_execution();
524
525 /**
526 Create a transactional block for the View change log event
527 GTID
528 BEGIN
529 VCLE
530 COMMIT
531 */
532 if (!error)
533 error= inject_transactional_events(pevent,cont);
534
535 DBUG_RETURN(error);
536 }
```
If the new member that joined the group recovers from this member it
will not find a VIEW_CHANGE_LOG_EVENT which will make it never
leave the RECOVERY state.
How to repeat:
See the code.
Suggested fix:
Extend the change on http://rb.no.oracle.com/rb/r/20494/diff/22-23#2
to handle the generic case.
Or change the timeout to infinite.