Bug #77622 Unable to insert rows of >1GiB with Python
Submitted: 5 Jul 2015 21:05 Modified: 21 Jul 2015 6:25
Reporter: Daniël van Eeden (OCA) Email Updates:
Status: Not a Bug Impact on me:
None 
Category:Connector / Python Severity:S3 (Non-critical)
Version:2.0.4 OS:Any
Assigned to: CPU Architecture:Any

[5 Jul 2015 21:05] Daniël van Eeden
Description:
Inserting rows of more that 1GiB does not seem to be easily possible.

With Connector/PHP and Connector/J this should be possible:
http://stackoverflow.com/questions/3754238/is-there-any-way-to-insert-a-large-value-in-a-m...

How to repeat:
Set these
innodb_log_file_size=5G
max_allowed_packet=1073741824

#!/usr/bin/env python3
import time
import mysql.connector

# http://stackoverflow.com/questions/1094841/reusable-library-to-get-human-readable-version-...
from math import log
def pretty_size(n,pow=0,b=1024,u='B',pre=['']+[p+'i'for p in'KMGTPEZY']):
    pow,n=min(int(log(max(n*b**pow,1),b)),len(pre)-1),n*b**pow
    return "%%.%if %%s%%s"%abs(pow%(-pow-1))%(n/b**float(pow),pre[pow],u)

def insert_data(x):
    start = time.time()
    print('Inserting {} chars ({})'.format(x, pretty_size(x)))
    data = 'a' * x
    stmt = 'INSERT INTO supersize(data) VALUES(%s)'
    cur.execute(stmt, (data,))
    end = time.time()
    runtime = end - start
    c.rollback()
    rollbacktime = time.time() - end
    runtime_per_mb = runtime / (x / 1024 / 1024)
    print('Runtime: {} (per MiB: {}), Rollback time: {}'.format(
           round(runtime,2), round(runtime_per_mb,2), round(rollbacktime,2)))

c = mysql.connector.connect(host='::1',port=5625,user='msandbox',
                            password='msandbox',database='test')

cur = c.cursor()

cur.execute('DROP TABLE IF EXISTS supersize')
cur.execute('''CREATE TABLE supersize(
id int unsigned auto_increment primary key,
data longblob) DEFAULT CHARSET latin1''')

# Insert up to 10MiB with steps of 100KiB
for x in range(1, 10*1024*1024, 100*1024):
    insert_data(x)

# Insert up to 10GiB with steps of 100MiB
for x in range(100*1024*1024, 10*1024*1024*1024, 100*1024*1024):
    insert_data(x)

cur.close()
c.close()

Suggested fix:
Make it possible to use the full longblob size from within Connector/Python.
[5 Jul 2015 21:08] Daniël van Eeden
Inserting 104857600 chars (100.0 MiB)
Runtime: 4.1 (per MiB: 0.04), Rollback time: 0.12
Inserting 209715200 chars (200.0 MiB)
Runtime: 14.6 (per MiB: 0.07), Rollback time: 0.76
Inserting 314572800 chars (300.0 MiB)
Runtime: 22.93 (per MiB: 0.08), Rollback time: 0.44
Inserting 419430400 chars (400.0 MiB)
Runtime: 31.78 (per MiB: 0.08), Rollback time: 0.38
Inserting 524288000 chars (500.0 MiB)
Runtime: 17.56 (per MiB: 0.04), Rollback time: 0.45
Inserting 629145600 chars (600.0 MiB)
Runtime: 50.11 (per MiB: 0.08), Rollback time: 0.83
Inserting 734003200 chars (700.0 MiB)
Runtime: 55.82 (per MiB: 0.08), Rollback time: 0.49
Inserting 838860800 chars (800.0 MiB)
Runtime: 66.86 (per MiB: 0.08), Rollback time: 0.48
Inserting 943718400 chars (900.0 MiB)
Runtime: 57.43 (per MiB: 0.06), Rollback time: 0.6
Inserting 1048576000 chars (1000.0 MiB)
Runtime: 67.41 (per MiB: 0.07), Rollback time: 0.28
Inserting 1153433600 chars (1.1 GiB)
Traceback (most recent call last):
  File "/home/dvaneeden/venv/supersize/lib64/python3.4/site-packages/mysql/connector/network.py", line 130, in send_plain
    self.sock.sendall(packet)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./supersize.pl", line 41, in <module>
    insert_data(x)
  File "./supersize.pl", line 16, in insert_data
    cur.execute(stmt, (data,))
  File "/home/dvaneeden/venv/supersize/lib64/python3.4/site-packages/mysql/connector/cursor.py", line 507, in execute
    self._handle_result(self._connection.cmd_query(stmt))
  File "/home/dvaneeden/venv/supersize/lib64/python3.4/site-packages/mysql/connector/connection.py", line 722, in cmd_query
    result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
  File "/home/dvaneeden/venv/supersize/lib64/python3.4/site-packages/mysql/connector/connection.py", line 504, in _send_cmd
    packet_number)
  File "/home/dvaneeden/venv/supersize/lib64/python3.4/site-packages/mysql/connector/network.py", line 133, in send_plain
    errno=2055, values=(self.get_address(), _strioerror(err)))
mysql.connector.errors.OperationalError: 2055: Lost connection to MySQL server at '::1:5625', system error: 32 Broken pipe
[16 Jul 2015 9:00] Andrii Nikitin
1G is protocol limit, so you should not expect be able to insert - just error message should be ER_NET_PACKET_TOO_LARGE , which I get with this test case:

mysql.connector.errors.OperationalError: 1153 (08S01): Got a packet bigger than 'max_allowed_packet' bytes

What happens if you do the same SQL command trough mysql command line client?

Side note: You can use "ENGINE = MyISAM" to save some disk space in test.
[16 Jul 2015 9:09] Daniël van Eeden
> 1G is protocol limit, so you should not expect be able to insert
That is a per packet limit, not a per-insert limit. 

> just error message should be ER_NET_PACKET_TOO_LARGE , which I get with this test case:
> mysql.connector.errors.OperationalError: 1153 (08S01): Got a packet bigger than 'max_allowed_packet' bytes
You probably need to set max_allowed_packet to 1073741824

> What happens if you do the same SQL command trough mysql command line client?
Didn't try, I don't know.

> Side note: You can use "ENGINE = MyISAM" to save some disk space in test.
Thanks.

Looks like the C API has this:
https://dev.mysql.com/doc/refman/5.6/en/mysql-stmt-send-long-data.html

Then the data can be sent in multiple chuncks.

In the protocol:
https://dev.mysql.com/doc/internals/en/com-stmt-send-long-data.html
So probably Connector/Python doesn't use COM_STMT_SEND_LONG_DATA?
[16 Jul 2015 11:37] Andrii Nikitin
max_allowed_packet applies to most of memory allocations inside mysql server, including single SQL command size or single parameter value (minus some overhead).

To expect COM_STMT_SEND_LONG_DATA to be used you must use prepared cursor. 

And C/Py really uses loop with ServerCmd.STMT_SEND_LONG_DATA with 8K chunks to send any parameter.

In my understanding to claim that it is C/Py bug - you should demonstrate that you can do the same with any other Connector. 

And if you just need assistance to confirm that any SQL command or any other value is limited to max_allowed_packet size - please ask on forums or open Support ticket.

(E.g. if I set max_allowed_packet 10M I am able to pass only (10M-40) bytes with regular cursor and (10M-20) bytes with prepared cursor). 
In other words COM_STMT_SEND_LONG_DATA will help if you want to insert 10 huge columns in single statement, but it still cannot overdo limit set by max_allowed_packet
[19 Jul 2015 21:01] Daniël van Eeden
Tested with:
https://github.com/dveeden/mysql_supersize/blob/master/Python/supersize.py
https://github.com/dveeden/mysql_supersize/blob/master/C/supersize.c

With ConnPy 2.0.4 and CEXT:
Inserting 1048576000 chars (1000.0 MiB)
Traceback (most recent call last):
  File "/home/dvaneeden/venv/supersize/lib64/python3.4/site-packages/mysql/connector/connection_cext.py", line 366, in cmd_query
    raw_as_string=raw_as_string)
_mysql_connector.MySQLInterfaceError: Lost connection to MySQL server during query

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./supersize.py", line 64, in <module>
    insert_data(x)
  File "./supersize.py", line 24, in insert_data
    cur.execute(stmt, (data,))
  File "/home/dvaneeden/venv/supersize/lib64/python3.4/site-packages/mysql/connector/cursor_cext.py", line 260, in execute
    raw_as_string=self._raw_as_string)
  File "/home/dvaneeden/venv/supersize/lib64/python3.4/site-packages/mysql/connector/connection_cext.py", line 369, in cmd_query
    sqlstate=exc.sqlstate)
mysql.connector.errors.OperationalError: 2013 (HY000): Lost connection to MySQL server during query

With ConnPy 2.0.4 (native):
Inserting 1153433600 chars (1.1 GiB)
Traceback (most recent call last):
  File "/home/dvaneeden/venv/supersize/lib64/python3.4/site-packages/mysql/connector/network.py", line 130, in send_plain
    self.sock.sendall(packet)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./supersize.py", line 64, in <module>
    insert_data(x)
  File "./supersize.py", line 24, in insert_data
    cur.execute(stmt, (data,))
  File "/home/dvaneeden/venv/supersize/lib64/python3.4/site-packages/mysql/connector/cursor.py", line 1095, in execute
    parameters=self._prepared['parameters'])
  File "/home/dvaneeden/venv/supersize/lib64/python3.4/site-packages/mysql/connector/connection.py", line 981, in cmd_stmt_execute
    packet = self._send_cmd(ServerCmd.STMT_EXECUTE, packet=execute_packet)
  File "/home/dvaneeden/venv/supersize/lib64/python3.4/site-packages/mysql/connector/connection.py", line 265, in _send_cmd
    packet_number)
  File "/home/dvaneeden/venv/supersize/lib64/python3.4/site-packages/mysql/connector/network.py", line 133, in send_plain
    errno=2055, values=(self.get_address(), _strioerror(err)))
mysql.connector.errors.OperationalError: 2055: Lost connection to MySQL server at '::1:5610', system error: 32 Broken pipe

With the C example I can upload more than 1GB if max_long_data_size is set large enough (tested with MariaDB 10.1, should work with MySQL 5.5).

Related:
http://bugs.mysql.com/bug.php?id=77781
[19 Jul 2015 21:08] Daniël van Eeden
With general log on:
tail -f data/general.log | cut -c1-80

Insert from C code:
		   18 Long Data	
		   18 Prepare	INSERT INTO supersize(data) VALUES(?)
150719 23:04:39	   18 Execute	INSERT INTO supersize(data) VALUES('aaaaaaaaaaaaaa

Insert from Python:
		   16 Query	INSERT INTO supersize(data) VALUES('aaaaaaaaaaaaaa

This suggests that Connector/Python isn't using the mysql_stmt_send_long_data() / COM_STMT_SEND_LONG_DATA
[20 Jul 2015 11:09] Andrii Nikitin
Thank you for demonstration using connector C
As was mentioned before, you should use prepared cursor to expect COM_STMT_SEND_LONG_DATA to be used.

In particular, following lines should be technically able to insert 1025M into MySQL 5.5.45

cur = cnx.cursor(prepared=True)
cur.execute("INSERT INTO bugjstest VALUES (%s)", (io.BytesIO(bytes("A"*1025*1024*1024, "UTF-8")), ))

Following issues still exist:
1. Documentation is missing about requirement of using IOBase to pass long_data parameters into prepared statements
2. Connector sends long_data with too small chunks 8k, (which severely impacts performance for huge parameters).
3. 5.6 has 1G limit for total size of long_data parameter

Unless you disagree - I am going to :
+ create 2 new issues for 1.&2.
+ close this request with Duplicate of documentation issue 1. (to have clearer issue)  
- #3. should be covered by bug #77781
[20 Jul 2015 11:32] Daniël van Eeden
I used a prepared cursor for native, but that didn't work for CEXT:
http://bugs.mysql.com/bug.php?id=77780

> Unless you disagree - I am going to :
> + create 2 new issues for 1.&2.
> + close this request with Duplicate of documentation issue 1. (to have clearer issue)  
> - #3. should be covered by bug #77781
You can set this bug to 'Not a Bug' and add a note of the 2 bugs created for 1. and 2.

Thanks!
[21 Jul 2015 6:25] Andrii Nikitin
Issues were split into:

Bug #77788 Document how Connector uses COM_STMT_SEND_LONG_DATA
Bug #77789 Too small chunks when sending huge parameters with COM_STMT_SEND_LONG_DATA

Closing this reprot as 'Not a bug' as was discussed earlier
[5 Feb 2016 10:15] Chiranjeevi Battula
http://bugs.mysql.com/bug.php?id=80238 marked as duplicate of this one.