Bug #19076 MaxDB Sync Manager not usable over volatile networks
Submitted: 13 Apr 2006 16:16 Modified: 7 Jan 2008 11:12
Reporter: Christopher Kocmoud Email Updates:
Status: Closed Impact on me:
None 
Category:MaxDB Severity:S1 (Critical)
Version:7.6 OS:Microsoft Windows (Windows XP)
Assigned to: CPU Architecture:Any

[13 Apr 2006 16:16] Christopher Kocmoud
Description:
MaxDB synchronization employs a transport system compliant to the “Java Message Service” (JMS) specification. Through reliable and asynchronous messaging, several synchronization participants exchange their data with the message server, which stores the data until a data consumer connects and request them. A synchronization service runs on the host of each synchronization participant to monitor the changes in the database and communicate the changes with the message server. Upon starting, each synchronization service registers itself with the message server. Both the message server and synchronization service are programmed in Java with the design oriented for cross platforms, distributed, and scalable applications.
 
MaxDB synchronization implements asynchronous communication and thus no permanent network connections are required. Since the message server is backed with a database, synchronization data can be recovered even in the case a server crash. However, user intervention is needed to coordinate the restarting of the message server and synchronization services. In a volatile network environment, the synchronization service exits from running when the underlying TCP socket connection is reset. To recover from a network disruption like this, one must wait for the network connection to be restored and then restart the message server and synchronization services on all participants.

In summary, the MaxDB Synchronization Manager is not designed to run continuously without any user attention while the network connection comes and goes, which is really desired for many mobile database applications such as ours.

How to repeat:
In order to duplicate the problem, one may follow the example that comes with the Max DB Synchronization Manager distribution. Below are the major steps.
 
1. Install MaxDB including Synchronization Manager onto two networked computers;
2. Create and start two MaxDB instances named DBMASTER and DBCLIENT on both computers;
3. Initialize the message server on one computer, which is designated the master computer for convenience;
4. Use the Synchronization Manager GUI on the master computer to define the synchronization schema between the DBMASTER database on the master computer and the DBCLIENT database on the client computer; (Note: remember to set the hostname of the DBCLIENT database to the IP address of the client computer.)
5. Start the synchronization services on both computers; (Note: for the client computer, the Message Server Host is set to the IP address of the master computer where the message server is residing.)
6. Unplug the Ethernet cable to the client computer or bring it out of range when a wireless network is connecting the two computer. This will disrupt the network connection. The synchronization service running on the client computer exits due to socket exceptions that is caused by resetting the TCP sockets. Before restoring the network, the synchronization service cannot restart since no route is available to the message server host. After the network is restored, the synchronization service exits immediately after restarting because it cannot negotiate a registration with the message server. To overcome the problem, one must stop the message server and the synchronization service on the master computer and then restart the message server and synchronization services on both computers in a proper order.

Suggested fix:
Ideally:

1. Prevent the sync service from exiting due to the socket exception.  Catch the exception and periodically attempt to re-establish the TCP socket connection.

2. Modify the message server so it acknowledges re-registrations.  It currently just ignores registration requests from the same source.
[13 Apr 2006 19:15] Christopher Kocmoud
Severity changed.  Did not read the severity level explanations when bug report was submitted.
[14 Apr 2006 16:38] C.J. Adams-Collier
Hello!

It's good to hear of another Synchronization Manager use case.

I've spoken with the devs recently, and their time is being spent on other projects right now, so this may not be completed in a timely manner.

I'll pass it on to them though.

Cheers,

C.J.
[18 Apr 2006 11:55] Wolfgang Auer
Hello Christopher,

It was intended that all active topic subscribers are set to an inactive state when a socket exception occurs, and a restart the message server is not necessary. The sync service should reconnect to the message server without problems. 
We tested this by terminating the sync services with [CTRL] [C]. In your case the message service does not detect that the socket is broken. C.J. will open a ticket for us and we will fix the problem in one of the next releases.

Your idea of periodically attempting to reestablish the TCP socket connection will be set on your plans for the further Synchronization Manager development.

regards
   Wolfgang
[18 Apr 2006 16:00] C.J. Adams-Collier
Hello!

I've spoken again with the syncman dev team.  They tell me that they will put some time into addressing this issue for the next release.  Thank you for the submission of this issue.

C.J.
[18 Apr 2006 18:50] Christopher Kocmoud
Thank you for all your comments.  It's reassuring to get such great feedback from support!  We were aware of how the Sync Mgr was *intended* to be used.  We are evaluating MaxDB/SyncMgr for use with a mobile Toughbook that intermittently leaves the WiFi range of the primary database station for data collection.  Once it re-enters the WiFi footprint, we were hoping it would "automatically" re-sync and bidirectionally propogate DB changes.  Sounds like this may happen, resources permitting, in the next release.  Thanks!
[10 May 2006 16:28] C.J. Adams-Collier
http://www.sapdb.org/webpts?wptsdetail=yes&ErrorType=0&ErrorID=1141392

Feature planned to be added in 7.6.0.30
[10 Sep 2007 8:34] Wolfgang Auer
Solution has been submitted.

http://www.sapdb.org/webpts?wptsdetail=yes&ErrorType=0&ErrorID=1141392

regards
  Wolfgang