MySQL Bugs: #45000: Make .NET connector faster

Bug #45000	Make .NET connector faster
Submitted:	21 May 2009 2:43	Modified:	21 Aug 2009 10:52
Reporter:	Dennis Haney	Email Updates:
Status:	Open	Impact on me:	None
Category:	Connector / NET	Severity:	S4 (Feature request)
Version:	6.0.3	OS:	Any
Assigned to:	Assigned Account	CPU Architecture:	Any

Description:
I am trying to load a huge table into memory (29000 rows and 275 cols), and it takes a couple of seconds.

How to repeat:
read any table

Suggested fix:
I ran some performance traces and here is some clues how to make it faster:
MySqlDouble.ReadValue:
IMySqlValue IMySqlValue.ReadValue(MySqlPacket packet, long length, bool nullVal)
{
    if (nullVal)
    {
        return new MySqlDouble(true);
    }
    if (length == -1L)
    {
        byte[] byteBuffer = new byte[8];
        packet.Read(byteBuffer, 0, 8);
        return new MySqlDouble(BitConverter.ToDouble(byteBuffer, 0));
    }
    return new MySqlDouble(double.Parse(packet.ReadString(length), CultureInfo.InvariantCulture));
}

 
You should implement your own double.Parse. In my trace it takes almost 18% of the time to load my data. and the system parser can do a billion things, but I presume the format of the double sent from mysql is in a very strict format. Experince tells me that it can be made atleast twice as fast with a specialized parser.

I havent figured out how to make it go into the loop where the data is sent as binary thus avoiding the issue totally, any clue? That would make things atleast 25% faster.

The ReadString could also be more intelligent, it assumes UTF8 strings even for numeric values. You should have a specialized reader for the situations where the string can be assumed to be trivial ascii.

The BufferedStream used, is used with its default 4k buffer. You should probably overwrite that and use atleast 64k. (in MySqlStream constructor)

Ok, managed to google my way to figuring out how to use the binary protocol.
Who choose to bind this to the prepared statements. I shouldn't have to do another server roundtrip to get the desired speed.
But that doesnt really help everyone else, so binary packets should definately be default.

Hi Dennis and thanks for your report.

Unfortunately, you are not telling us of your use-case but rather dropping code fragment and making suggestions. This is nothing I can start working on.

So a volatile answer from me would be "Implementing Double.Parse makes no sense. If you're after max performance please use Integers". Of course, you can not make use this suggestion, right?

Second comment seem to suggest you're using PS and you complain they are "faked" by default, right? Well, they are! Unless IgnorePrepare is set to False (True by default). This remained from times when server itself had problems with prepared statements.

Now you see how much guessing I did. Please elaborate your case better and attach test case supporting your claims.

Or is it that you want binary protocol for plain statements? If so, I think it's implemented in 5.4 by Kostja ("execute direct" or something).

Hi Tonci Grgin

I thought I was rather specific with my use case.... "I am trying to load a huge table into memory (29000 rows and 275 cols), and it takes a couple of seconds."

The problem occurs primarily due to the fact that data is sent from MySQL to the client using a TEXT protocol, and I specified some hints to speed this up.
I did manage to figure out how to force the binary protocol, and get out of this problem, and it also give me almost double the speed making my use case acceptable.
However, for everyone else that do not have the time to figure out how to enable the binary protocol, the existing TEXT code could be optimized with a few simple things as I mentioned.

All of the IMySqlValue are implemented with similar patterns, lets take the double for instance.
The time using part of this code is this line: double.Parse(packet.ReadString(length)
There are three problems here as I also mentioned above.
1. double.Parse is SLOOOOOW. It can parse a HUGE range of numbers, where as the format sent from MySQL matches a very specific subset of those. Thus if you replce double.Parse with a MySqlDoubleParser.Parse and specialize it for the format that is sent from the server.
2. packet.ReadString is also a generalized function. It will read UTF-8 strings, even in the case of a double or an int. This obviously takes time since it needs to check for multibyte encodings for every char. Specializing the function for plain ascii reads will also give a speed boost.
3. packet.ReadString depends on BufferedStream, which is a good thing since most reads are done in single bytes. However, it is using the standard 4k buffer, which means that it will rather often have to refill itself. You should check if changing the buffer size to 64k improves speed or not.

Hi Dennis and thanks for explanation.

You leave me little choice but to set this to "Verified" so we can check on your claims and see if things can get any faster...

Reggie?

Dennis

We are always looking to make the connector faster.  I've spent some time looking into your points.  Here is what I found.

I ran a test where I timed Double.Parse parsing 1 billion double values.  It took like 0.02 seconds to parse 1 billion values.  I don't think the parsing is causing you any performance headaches.  If you have an implementation of Double.parse that shows significant perf gains, I'll be happy to include it.

I then ran a test where I created a table with 1000 rows.  Each row had 3 double values in it.  I created a console app that read this table 1000 times (for a total of reading the rows 1 million times).  I timed with the existing packet.ReadString and then I created a version that read the bytes and created a string with ASCIIEncoding.Ascii.  There was no significant increase in perf.  I did notice one small perf gain when used the tempBuffer in ReadString instead of newing up a new buffer every time.

And, finally, increasing the buffer size in BufferedStream isn't always the right thing to do.  The buffered stream will try to fill itself and can cause more delay before  getting the initial data.  And if the user is paging data or just wants the first few rows this will cause a degradation on him.  I'm not convinced that this is the right play.

We really appreciate  your input and always welcome concrete improvements to our product.  Keep up the good work and we'll keep profiling and looking for ways to speed things up.

I am not quite sure exactly how you tested the speed of double.Parse, but it is way slower than what you write.
         Stopwatch sw = new Stopwatch();
         sw.Start();
         for (int i = 0; i < 5000000; i++)
            x = double.Parse("123.12", NumberFormatInfo.InvariantInfo);
         sw.Stop();
         Console.WriteLine(sw.Elapsed.TotalSeconds);
Takes from 2200 to 2400 milliseconds on my machine in release mode.
Í implemented a quick and dirty double parser:
      /// <summary>
      /// Parses double in the format -?(?:[0-9]*(?:.[0-9]*)?)?
      /// </summary>
      public static double SimpleDoubleParser(string org)
      {
         char[] s00 = org.ToCharArray();
         int s00Length = s00.Length;
         int index = 0;
         long intpart = 0, floatpart = 0;
         if (s00Length == 0)
            return 0.0;
         bool sign = false;
         if (s00[0] == '-')
         {
            sign = true;
            index = 1;
         }
         if (index == s00Length)
            return 0.0;
         for (; index < s00Length; index++)
         {
            if (s00[index] < '0' || s00[index] > '9')
               break;
            intpart = intpart*10 + (s00[index] - '0');
         }
         if (index == s00Length)
            return sign ? -intpart : intpart;
         //TODO add assert if not a . at this point
         int decimalpoints = 1;
         for (index++; index < s00Length; index++, decimalpoints*=10)
         {
            floatpart = floatpart*10 + (s00[index] - '0');
         }
         double ret = intpart + (double)floatpart/decimalpoints;
         return sign ? -ret : ret;
      }

It does the job in 600 milliseconds.

and since you in this context are reading the bytes 1 by 1 anyway, you can inline that and avoid the ToCharArray. Then it does the job in 300msec... Almost 9 times faster than the original