Bug #17513 User defined functions API is undedeveloped
Submitted: 17 Feb 2006 9:03 Modified: 15 Apr 2006 2:01
Reporter: Kristian Koehntopp Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: User-defined functions ( UDF ) Severity:S4 (Feature request)
Version:5.0.18 OS:Linux (Linux)
Assigned to: CPU Architecture:Any

[17 Feb 2006 9:03] Kristian Koehntopp
Description:
The UDF API does not transfer charset information from the server into the UDF. That makes it very hard to write UDF string functions that deal with collations and charsets properly.

Also, there is no API from MySQL exposed to UDFs, so that UDF authors have to reinvent the wheel dealing with charsets and collations.

How to repeat:
This is the source for an Oracle-Like "translate(s1, s2, s3)" function. It acts like the Unix command tr, and will translate all characters in s2 into their counterparts from s3, transforming s1.

This works with latin1 or any other single byte charset, but try to make it detect utf8 and handle utf8 in an elegant way.

/* Copyright (C) 2002 MySQL AB */
/*
** Syntax for the new commands are:
** create function <function_name> returns {string|real|integer}
**                soname <name_of_shared_library>
** drop function <function_name>
**
**
*/

#ifdef STANDARD
#include <stdio.h>
#include <string.h>
#ifdef __WIN__
typedef unsigned __int64 ulonglong;     /* Microsofts 64 bit types */
typedef __int64 longlong;
#else
typedef unsigned long long ulonglong;
typedef long long longlong;
#endif /*__WIN__*/
#else
#include <my_global.h>
#include <my_sys.h>
#endif
#include <mysql.h>
#include <m_ctype.h>
#include <m_string.h>           // To get strmov()

static pthread_mutex_t LOCK_hostname;

#ifdef HAVE_DLOPEN

/* These must be right or mysqld will not find the symbol! */

extern "C" {
my_bool translate_init(UDF_INIT *initid, UDF_ARGS *args, char *message);
void metaphon_deinit(UDF_INIT *initid);
char *translate(UDF_INIT *initid, UDF_ARGS *args, char *result,
               unsigned long *length, char *is_null, char *error);
}

/*************************************************************************
** Example of init function
** Arguments:
** initid       Points to a structure that the init function should fill.
**              This argument is given to all other functions.
**      my_bool maybe_null      1 if function can return NULL
**                              Default value is 1 if any of the arguments
**                              is declared maybe_null.
**      unsigned int decimals   Number of decimals.
**                              Default value is max decimals in any of the
**                              arguments.
**      unsigned int max_length  Length of string result.
**                              The default value for integer functions is 21
**                              The default value for real functions is 13+
**                              default number of decimals.
**                              The default value for string functions is
**                              the longest string argument.
**      char *ptr;              A pointer that the function can use.
**
** args         Points to a structure which contains:
**      unsigned int arg_count          Number of arguments
**      enum Item_result *arg_type      Types for each argument.
**                                      Types are STRING_RESULT, REAL_RESULT
**                                      and INT_RESULT.
**      char **args                     Pointer to constant arguments.
**                                      Contains 0 for not constant argument.
**      unsigned long *lengths;         max string length for each argument
**      char *maybe_null                Information of which arguments
**                                      may be NULL
**
** message      Error message that should be passed to the user on fail.
**              The message buffer is MYSQL_ERRMSG_SIZE big, but one should
**              try to keep the error message less than 80 bytes long!
**
** This function should return 1 if something goes wrong. In this case
** message should contain something usefull!
**************************************************************************/

my_bool translate_init(UDF_INIT *initid, UDF_ARGS *args, char *message)
{
  if (args->arg_count != 3) {
   strcpy(message, "translate requires 3 arguments");
   return 1;
  }

  if (args->arg_type[0] != STRING_RESULT
   || args->arg_type[1] != STRING_RESULT
   || args->arg_type[2] != STRING_RESULT) {
    strcpy(message,"translate requires all arguments to be strings");
    return 1;
  }

  if (args->lengths[1] != args->lengths[2]) {
    strcpy(message, "translate requires that arg1 and arg2 have the same length");
    return 1;
  }
  // result will have same length as arg[0]
  initid->max_length=args->lengths[0];

  return 0;
}

/****************************************************************************
** Deinit function. This should free all resources allocated by
** this function.
** Arguments:
** initid       Return value from xxxx_init
****************************************************************************/

void translate_deinit(UDF_INIT *initid)
{
}

/***************************************************************************
** UDF string function.
** Arguments:
** initid       Structure filled by xxx_init
** args         The same structure as to xxx_init. This structure
**              contains values for all parameters.
**              Note that the functions MUST check and convert all
**              to the type it wants!  Null values are represented by
**              a NULL pointer
** result       Possible buffer to save result. At least 255 byte long.
** length       Pointer to length of the above buffer.  In this the function
**              should save the result length
** is_null      If the result is null, one should store 1 here.
** error        If something goes fatally wrong one should store 1 here.
**
** This function should return a pointer to the result string.
** Normally this is 'result' but may also be an alloced string.
***************************************************************************/

char *translate(UDF_INIT *initid, UDF_ARGS *args, char *result,
               unsigned long *length, char *is_null, char *error)
{
  if (!args->args[0] || !args->args[1] || !args->args[2]) {
    *is_null=1;
    return 0;
  }

  char *org_result=result;

  for (int i=0; i<args->lengths[0]; i++) {
    result[i] = args->args[0][i];
    for (int j=0; j<args->lengths[1]; j++) {
      if (args->args[0][i] == args->args[1][j]) {
        result[i] = args->args[2][j];
        break;
      }
    }
  }

  *length = args->lengths[0];
  return result;
}

#endif /* HAVE_DLOPEN */

Suggested fix:
Expose an API for UDFs to use to the UDF.

Transport charset and collation information into the UDF.