Bug #68946 UDF *_init and *_deinit functions called only once for multiple-row select
Submitted: 13 Apr 2013 10:13 Modified: 17 Apr 2013 5:36
Reporter: Sveta Smirnova Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: User-defined functions ( UDF ) Severity:S3 (Non-critical)
Version:5.5.32, 5.6.12, 5.1.70, 5.0.97 OS:Any
Assigned to: CPU Architecture:Any

[13 Apr 2013 10:13] Sveta Smirnova
Description:
There is following information about UDF calling sequences at http://dev.mysql.com/doc/refman/5.6/en/udf-calling.html:

----<q>----
 The initialization and deinitialization functions are declared like this:

my_bool xxx_init(UDF_INIT *initid, UDF_ARGS *args, char *message);

void xxx_deinit(UDF_INIT *initid);

...

 char *ptr

A pointer that the function can use for its own purposes. For example, functions can use initid->ptr to communicate allocated memory among themselves. xxx_init() should allocate the memory and assign it to this pointer:

initid->ptr = allocated_memory;

In xxx() and xxx_deinit(), refer to initid->ptr to use or deallocate the memory. 
----</q>----

Latter at http://dev.mysql.com/doc/refman/5.6/en/udf-return-values.html we see:

----<q>----
If your string function does not use the supplied buffer (for example, if it needs to return a string longer than 255 characters), you must allocate the space for your own buffer with malloc() in your xxx_init() function or your xxx() function and free it in your xxx_deinit() function. You can store the allocated memory in the ptr slot in the UDF_INIT structure for reuse by future xxx() calls. See Section 22.3.2.1, “UDF Calling Sequences for Simple Functions”. 
----</q>----

So reader can think she can use initid->ptr for following purposes:

1. To store result, created from input values. So, she would consider it is safe not to allocate maximum amount of memory to initid->ptr in case if somebody can pass 4G blob value as an argument, but make it depending from actual parameters of the function in each exact case. For example, 256K for 256K string and 1G for 1G string. Such relying can lead to server crashes.

2. To initialize initid->ptr with some value in *_init function, then proceed this value in UDF function. But actually, if value of initid->ptr changed in UDF function, second call of the UDF function will see changed value, not one which was supposed to be set in _init function.

Attached test case demonstrates second issue.

How to repeat:
1. Unzip attached test case, cd to the directory.
2. Run following commands:

cmake . -DMYSQL_DIR=/home/sveta/src/mysql-5.6
make
MYSQL_DIR=/home/sveta/src/mysql-5.6 make -f Makefile.unix test_initid_bug

Point MYSQL_DIR to the MySQL basedir.

3. It will outputs:

--- /home/sveta/src/mysql-5.6/mysql-test/suite/initid_bug_udf/r/initid_bug.result	2013-04-13 13:07:34.716909170 +0300
+++ /home/sveta/src/mysql-5.6/mysql-test/suite/initid_bug_udf/r/initid_bug.reject	2013-04-13 13:07:37.427919998 +0300
@@ -7,10 +7,10 @@
select initid_bug() from t1;
 initid_bug()
 1
-1
-1
-1
-1
-1
-1
-1
+10
+100
+1000
+10000
+100000
+1000000
+10000000

Suggested fix:
Either code should be fixed, so *_init and *_deinit functions are called for each call of UDF functions or user manual must be fixed, so it does not confuse users.
[13 Apr 2013 10:14] Sveta Smirnova
test case

Attachment: initid_bug.zip (application/zip, text), 172 bytes.

[14 Apr 2013 16:47] Daniël van Eeden
The attached zip file seems to be empty.

$ unzip -l initid_bug.zip 
Archive:  initid_bug.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2013-04-13 11:23   initid_bug/
---------                     -------
        0                     1 file
[15 Apr 2013 0:04] Sveta Smirnova
fixed test case

Attachment: initid_bug.zip (application/zip, text), 17.68 KiB.

[15 Apr 2013 0:05] Sveta Smirnova
Ups!

Sorry for that. Fixed archive attached.
[15 Apr 2013 10:56] Roland Bouman
Please, don't change the UDF interface, and keep the calling sequences as is.

In addition, I personally don't think the current documentation is unclear or incomplete in any way. The way I read it, it says that the UDF interface provides a 255byte pre-allocated buffer that we may or may not use, and that it also provides an additional ptr member which can be used to keep track of additional resources. 

The nature of C/C++ is such that one must explicitly allocate and deallocate memory; therefore it stands to reason that if the user finds the pre-allocated buffer does not suit their needs they will need to allocate memory themselves and free it again too. The documentation provides a few helpful hints that allocating may be done in the init and xxx function, and that it *must* be deallocated again no later than the deinit function.
[15 Apr 2013 11:44] Sveta Smirnova
Roland,

I still think it is not clear when exactly *_init and *_deinit functions called.
[17 Apr 2013 5:36] Erlend Dahl
[16 Apr 2013 1:22] Georgi Kodinov

This is expected behavior. According to
http://dev.mysql.com/doc/refman/5.6/en/adding-udf.html :

"When an SQL statement invokes XXX(), MySQL calls the initialization function
xxx_init() to let it perform any required setup, such as argument checking or
memory allocation. If xxx_init() returns an error, MySQL aborts the SQL
statement with an error message and does not call the main or
deinitialization functions. Otherwise, MySQL calls the main function xxx()
once for each row. After all rows have been processed, MySQL calls the
deinitialization function xxx_deinit() so that it can perform any required
cleanup."