Bug #91107 BUG: cursor columns representing aggregate funcs values incorrectly converted
Submitted: 1 Jun 2018 21:47 Modified: 25 Jun 2018 22:45
Reporter: Nenad J Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / Python Severity:S1 (Critical)
Version:8.0.11 OS:Windows (10 Home)
Assigned to: CPU Architecture:x86 (64 bit)
Tags: mysql connector python cursor execute pandas byte binary strings nan

[1 Jun 2018 21:47] Nenad J
Description:
This bug is a big problem for everyone that is using 8.0.11 for this is a code breaking bug! The very same python 3.6 code works with both 8.0.5 and 8.0.6.

Details: If sql is executed using mysql connector 8.0.11, some aggregate functions columns (max, round, avg with floats and dates) are returned in a cursor as binary arrays and subsequently fail to convert properly into their correct types during pandas conversion, thus e.g. a float number becomes NaN in pandas. This is not pandas fault for the very same code works with mysql connector 8.0.5 and 8.0.6. In addition, column names for those columns are also sent back as binary strings as opposed to just strings as before, thus referencing e.g. 'avg_price' no longer finds it for the column name is actually b'avg_price'. So there are two problems - wrong column names and columns' values types.

How to repeat:
#%% --- python 3.6 code ---
import mysql.connector
import pandas
db_conn = { 'host':'my-server','user':'my-user','password':'my-pwd','port':3306 }
sql_statement = r"""SELECT
                        dish_id,
                        dish_name,
                        ROUND(AVG(price)) AS avg_price,
                        MAX(order_time) AS latest_order_date
                    FROM all_orders
                    GROUP BY dish_id;"""
conn = mysql.connector.connect(**db_conn)
cur = conn.cursor(dictionary=True)
cur.execute(sql_statement)
rows = cur.fetchall()
col_names = [x[0] for x in cur.description]
cur_df = pd.DataFrame(rows,columns=col_names)
cur_multi.close()
conn.close()
print(cur_df['avg_price']) # <- this fails for column name is binary string b'avg_price'
print(cur_df['avg_price']*100.0) # <- this gives NaN for binary strings did not covert properly into floats

Suggested fix:
This may be related to your "fix" denoted here https://dev.mysql.com/doc/relnotes/connector-python/en/news-8-0-11.html under Bugs Fixed section - paragraphs referencing Bug #27364914 & Bug #24948205 & Bug #83516. That text talks about byte arrays, etc.

In other words, it appears that you may have fixed one thing and broken another.
[5 Jun 2018 10:14] Chiranjeevi Battula
Hello Nenad,

Thank you for the bug report and test case.
Verified this behavior on MySQL Connector/Python 8.0.11.

Thanks,
Chiranjeevi.
[6 Jun 2018 18:40] Nenad J
Chiranjeevi, mentioning this in case it may help: With mysql connector 8.0.5, ROUND(…) and SUM(…) functions return column type MYSQL_TYPE_NEWDECIMAL (type code 246 in cursor description property). Did not check other aggregates e.g. MAX with dates, but this may give you some helpful info while looking into this issue. Also, aside from new decimal, there's MYSQL_TYPE_NEWDATE (type code 14) and those may be affected as well.
Links:
https://dev.mysql.com/doc/connectors/en/connector-python-api-mysqlcursor-description.html 
https://dev.mysql.com/doc/internals/en/myisam-column-attributes.html 
https://dev.mysql.com/doc/connectors/en/connector-python-api-mysqlcursor-description.html
[7 Jun 2018 18:58] Jose Israel Gomez Delgado
Posted by developer:
 
Thanks for your report, that it comes in really good time, we are trying to make the c-extension run seamlessly to the pure python implementation on incoming releases, I mention this because this issue was not introduced while fixing other bug in a previews release, this issue belongs to the c-extension, who has been set to be used by default (where is available). 

Meanwhile there is a workaround for this issue by forcing the use of the pure python implementation with the option "use_pure":True at the connection initialization.
[7 Jun 2018 19:25] Nenad J
Unfortunately, same results with use_pure=True with mysql connector 8.05

I used: conn = mysql.connector.connect(**db_conn,use_pure=True)

This does not seem like a workaround. Are you sure that this parameter is not ignored? Or in case I don't have that c-lib on my machine, then this proves that pure python implementation has issues as well.
[7 Jun 2018 19:36] Nenad J
OK I tested it with 8.0.11 and it worked - no binary column names or values - great.

In my previous post, when I mentioned flag was not working with 8.0.5, I meant that ROUND(…) and SUM(…) functions still return column type MYSQL_TYPE_NEWDECIMAL (type code 246 in cursor description property). Is that by design?

Note: With use_pure=True, both 8.0.5 and 8.0.11 return the same including MYSQL_TYPE_NEWDECIMAL for the same columns.
[25 Jun 2018 22:45] Philip Olson
Posted by developer:
 
The upcoming MySQL Connector/Python 8.0.12 release fixes several result type 
bugs similar to this one, and multiple fixes were combined into a single 
changelog entry. Here's that entry:

The get_row() and get_rows() behavior differed with the C (connections
with CMySQLConnection) and pure Python (connections with MySQLConnection)
implementations of the connector. The resolved differences are:

With the pure Python implementation, all data was returned
as bytearrays; while the C implementation returned all data as Python
types with CMySQLConnection (cext). Both now return Python types.

With the pure Python implementation, they returned a tuple
with (row(s), eof), but with the C Extension they only returned the
row(s). Now both implementations return the tuple form; (row(s), eof).

For queries returning a result, with pure Python the
warning count was part of the returned eof. With the C extension, warning
count was only available using the warning_count property of the
connection object. Related, the warning_count property was not available
in the pure Python implementation. Now, result includes the warning count
for both implementations. 

Fetching rows using pure Python
would automatically set the unread_rows property to False. With the C
extension, explicitly calling the free_result() method of the connection
object was required. Related, free_result() was only available with the C
extension. Now, both implementations set unread_rows to False.

Thank you for the bug report.