Implementing C/C++ Dynamic Shared Libraries with Teradata UDFs
--
Having worked with Teradata User Installed Files (UIFs) and Java User Defined Functions (UDFs) in the past, I found the courage to finally attempt working with C/C++ UDFs. To make it an interesting learning experience, I set out to achieve the following:
Build the Boost C++ Regex source into a Dynamic Shared C++ Library (.so) and implement a Scalar UDF that utilises this library to parse URLs.
Teradata REGEXP functions are already available within Teradata so this article is by no means attempting to illustrate a “sledge hammer” approach to a simple problem. The focus is on the specific detail involved in implementing C/C++ libraries using Teradata C/C++ UDFs on a single node (SMP) system.
Setting up C/C++ Programming Environment
All activities discussed will be performed on the Teradata VMWare Express Edition 16.20, which is the latest release currently available for download.
Create New Linux User
➤ Logon to the VM using the default user and password of “root”.
➤ Create the following user (which will be used for all activities).
➤ Append the following entry to the new user’s bashrc profile to allow for a default umask 022 (equates to new files being created with permissions of 755).
/home/tdat/.bashrcumask 022
Download and Configure Eclipse IDE for C/C++
The below version of the Eclipse IDE will be used. Version:
Neon.3 Release (4.6.3)
Build id: 20170314–1500
This version is referred to in the documentation for the “Teradata Eclipse Plugin”. It ran fine on SLES 11 OS, which is the OS hosted on the VM.
The tar.gz can be downloaded from here.
➤ Logon to the VM with the new user created above (tdat
)
➤ Create a directory as shown below and save the download
(eclipse-cpp-neon-3-linux-gtk-x86_64.tar.gz
) to this location
tdat@TDExpress1620_Sles11:~> mkdir -p /home/tdat/apps
➤ Once download completes, extract the contents of the tar.gz
tdat@TDExpress1620_Sles11:~/apps> tar -xvf eclipse-cpp-neon-3-linux-gtk-x86_64.tar.gz
➤ Append jdk path location to PATH via .bashrc
/home/tdat/.bashrcPATH=/opt/teradata/jvm64/jdk8/bin:${PATH}
export PATH
➤ I needed to update the -Xmx parameter in Eclipse’ eclipse.ini
file to avoid memory issues/messages
/home/tdat/apps/eclipse/eclipse.ini-Xmx512m
➤ Eclipse may be launched when required using
/home/tdat/apps/eclipse/eclipse
Download Boost C++ Libraries for Linux/Unix
➤ Download the latest version of the Boost dev source (latest as of now is 1.72.0) and extract to /home/tdat/apps/
tdat@TDExpress1620_Sles11:~/apps> tar -xvf boost_1_72_0.tar.gz
Final parent directory containing the Boost C++ dev hierarchy should be at:
/home/tdat/apps/boost_1_72_0
Compile Boost Regex Source into Dynamic Shared Library (.so)
To compile the regex components into a shared library, proceed as follows.
➤ Change directory to where source is located
cd /home/tdat/apps/boost_1_72_0/libs/regex/src
➤ Compile using g++
g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c c_regex_traits.cppg++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c cpp_regex_traits.cpp g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c cregex.cpp g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c fileiter.cpp g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c icu.cpp g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c instances.cpp g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c internals.hpp g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c posix_api.cpp g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c regex.cpp g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c regex_debug.cpp g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c regex_raw_buffer.cpp g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c regex_traits_defaults.cpp g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c static_mutex.cpp g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c usinstances.cpp g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c w32_regex_traits.cpp g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c wc_regex_traits.cpp g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c wide_posix_api.cpp g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c winstances.cpp
➤ Create dynamic shared library, naming the output .so as libboost.so
g++ -shared -fPIC -o libboost.so \
c_regex_traits.o \
cpp_regex_traits.o \
cregex.o \
fileiter.o \
icu.o \
instances.o \
internals.hpp \
posix_api.o \
regex.o \
regex_debug.o \
regex_raw_buffer.o \
regex_traits_defaults.o \
static_mutex.o \
usinstances.o \
w32_regex_traits.o \
wc_regex_traits.o \
wide_posix_api.o \
winstances.o
➤ This outputs the library into current folder
/home/tdat/apps/boost_1_72_0/libs/regex/src/libboost.so
Create UDF Test code using Shared Library
➤ Open eclipse using the following command.
/home/tdat/apps/eclipse/eclipse
➤ You will be asked to select location of your Workspace, where all project artifacts will be stored (/home/tdat/workspace
)
➤ Create new project via “File -> C/C++ Project”
➤ After clicking “Next”, you will have the option to edit “advanced settings” for the project.
➤ Add the following directory to the “Include paths”
➤ For the compiler’s “Miscellaneous” settings, tick “Verbose” and “Position Independent Code”
➤ For the “GCC C++ Linker”
- for “Libraries”, enter the library name “
boost
” - add the directory
/home/tdat/apps/boost_1_72_0/libs/regex/src
as show below to the “Library search path”
➤ Create a new source folder
➤ Create a new source code file named parse_url.cpp
in the new src
folder.
➤ Edit parse_url.cpp
and add the following code (which was based off http://www.zedwood.com/article/cpp-boost-url-regex):
#include <string>
#include <iostream>
#include <boost/regex.hpp>
#include "cstring"using std::string;
using std::cout;
using std::endl;
using std::stringstream;void parse_url(const string& url_string) //with boost
{
boost::regex ex("(http|https)://([^/ :]+):?([^/ ]*)(/?[^ #?]*)\\x3f?([^ #]*)#?([^ ]*)");
boost::cmatch what;
if(regex_match(url_string.c_str(), what, ex))
{
string scheme = string(what[1].first, what[1].second);
string host = string(what[2].first, what[2].second);
string port = string(what[3].first, what[3].second);
string path = string(what[4].first, what[4].second);
string query = string(what[5].first, what[5].second);string jsonify = "{\"scheme\": \"" + scheme + "\","
" \"host\": \"" + host + "\","
" \"port\": \"" + port + "\","
" \"path\": \"" + path + "\","
" \"query\": \"" + query + "\"}";
cout << jsonify << endl;
}
}int main(int argc, char* argv[])
{
parse_url("https://www.samp.com:8080/samp/catg/path?parm1=sampparm");return 0;}
➤ Build the Project in Eclipse using “Project -> Build”
➤ If there are no errors, execute using “Run → Run” and you should see the sample url parsed and “jsonified” in the Eclipse Console
When formatted neatly, the Console output appears as follows:
{
"scheme": "https",
"host": "www.samp.com",
"port": "8080",
"path": "/samp/catg/path",
"query": "parm1=sampparm"
}
Preparing C++ Code for UDF Format
Comprehensive documentation describing aspects of the development can be found in the Teradata documentation. Relevant excerpts have been included in the Appendix of this article (taken verbatim from the Teradata Documentation).
➤ Create a new file named
/home/tdat/apps/boost_1_72_0/libs/regex/src/parse_url.cpp
and save the UDF-ready code as shown below.
#define SQL_TEXT Latin_Text
#include "sqltypes_td.h"#include <string>
#include <iostream>
#include <boost/regex.hpp>
#include "cstring"using std::string;
using std::cout;
using std::endl;
using std::stringstream;extern "C"
void parse_url(const char *url_string, char *result, char sqlstate[6])
{
boost::regex ex("(http|https)://([^/ :]+):?([^/ ]*)(/?[^ #?]*)\\x3f?([^ #]*)#?([^ ]*)");
boost::cmatch what;
if(regex_match(url_string, what, ex))
{
string scheme = string(what[1].first, what[1].second);
string host = string(what[2].first, what[2].second);
string port = string(what[3].first, what[3].second);
string path = string(what[4].first, what[4].second);
string query = string(what[5].first, what[5].second);string jsonify = "{\"scheme\": \"" + scheme + "\","
" \"host\": \"" + host + "\","
" \"port\": \"" + port + "\","
" \"path\": \"" + path + "\","
" \"query\": \"" + query + "\"}";strcpy(result, jsonify.c_str());
}
return;
}
Install parse_url UDF
UDF Object File for UDF
➤ Generate an output object file parse_url.o
by running the following
cd /home/tdat/apps/boost_1_72_0/libs/regex/srcg++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -I /usr/tdbms/etc -fPIC -c parse_url.cpp
Prepare and Execute CREATE/REPLACE FUNCTION for UDF
➤ The following code will be used to install the UDF to database MYPROJ
DATABASE MYPROJ;DROP FUNCTION parse_url;REPLACE FUNCTION parse_url (url_string VARCHAR(2500) )
RETURNS VARCHAR(2500)
LANGUAGE CPP
NO SQL
EXTERNAL NAME 'SL!boost!CO!parse_url!/home/tdat/apps/boost_1_72_0/libs/regex/src/parse_url.o'
PARAMETER STYLE TD_GENERAL;
➤ Launch a BTEQ session and run the REPLACE/CREATE as above
tdat@TDExpress1620_Sles11:/> cd /home/tdat/apps/boost_1_72_0/libs/regex/src
tdat@TDExpress1620_Sles11:~/apps/boost_1_72_0/libs/regex/src> bteqTeradata BTEQ 16.20.00.04 for LINUX. PID: 11459
Copyright 1984-2018, Teradata Corporation. ALL RIGHTS RESERVED.
Enter your logon or BTEQ command:.logon dbc;.logon dbc
Password:*** Logon successfully completed.
*** Teradata Database Release is 16.20.32.01
*** Teradata Database Version is 16.20.32.01
*** Transaction Semantics are BTET.
*** Session Character Set Name is 'ASCII
Teradata BTEQ 16.20.00.04 for LINUX
Copyright 1984-2018, Teradata Corporation. ALL RIGHTS RESERVED.
Enter your logon or BTEQ command:
.logon dbc;
*** Total elapsed time was 1 second.
BTEQ -- Enter your SQL request or BTEQ command:
DATABASE MYPROJ;
*** New default database accepted.
*** Total elapsed time was 1 second.
BTEQ -- Enter your SQL request or BTEQ command:
DROP FUNCTION parse_url;
*** Failure 5589 Function 'parse_url' does not exist.
Statement# 1, Info =0
*** Total elapsed time was 1 second.
BTEQ -- Enter your SQL request or BTEQ command:
REPLACE FUNCTION parse_url (url_string VARCHAR(2500) )
RETURNS VARCHAR(2500)
LANGUAGE CPP
NO SQL
EXTERNAL NAME 'SL!boost!CO!parse_url!/home/tdat/apps/boost_1_72_0/libs/reg
ex/src/parse_url.o!F!parse_url'
PARAMETER STYLE TD_GENERAL;
*** Create function completed with error(s).
*** Warning: 5603 Errors encountered in compiling UDF/XSP/UDM/UDT/JAR.
*** Total elapsed time was 3 seconds.
Errors/Warnings reported during compilation
---------------------------------------------------------------------------
/usr/bin/gcc -D_REENTRANT -D_LIBC_REENTRANT -I/usr/tdbms/etc -L/usr/td
bms/lib -L/usr/tdbms/lib -fpic -c Teradata_new_delete.cpp
/usr/bin/gcc -D_REENTRANT -D_LIBC_REENTRANT -I/usr/tdbms/etc -L/usr/td
bms/lib -L/usr/tdbms/lib -fpic -c pre_parse_url.c
/usr/bin/gcc -shared -fpic -Xlinker -rpath -Xlinker /usr/tdbms/lib -Xlinke
r -rpath -Xlinker /usr/tdbms/lib -Wl,--version-script=/var/opt/teradata/tdt
emp/UDFTemp/0408.30719.25720/UserUdf_versions.scr -D_REENTRANT -D_LIBC_REEN
TRANT -I/usr/tdbms/etc -L/usr/tdbms/lib -L/usr/tdbms/lib -o @FileList
-lboost -ludf -lm -ljil -lstdc++
/usr/lib64/gcc/x86_64-suse-linux/4.3/../../../../x86_64-suse-linux/bin/ld:
cannot find -lboost
collect2: ld returned 1 exit status
make: *** [libudf_0408_43.so] Error 1
➤ We get an error:
cannot find -lboost
collect2: ld returned 1 exit status
which tells us that the library boost
(note the naming convention, -lboost refers to libboost.so) could not be found.
➤ Looking more closely at the error log, we can see that the only directory being searched for libraries (.so) is : /usr/tdbms/lib
It’s becoming obvious that if we copy
/home/tdat/apps/boost_1_72_0/libs/regex/src/libboost.so
over to
/usr/tdbms/lib
and apply the correct permissions, ownership and group settings, then we should get it to work.
➤ Perform the following as “root”
cd /home/tdat/apps/boost_1_72_0/libs/regex/src
cp libboost.so /usr/tdbms/lib
cd /usr/tdbms/lib
chmod 555 libboost.so
chown teradata libboost.so
chgrp tdtrusted libboost.so
➤ Running the CREATE/REPLACE function via BTEQ once again works without error and creates the function successfully
UDF Preliminary Testing
➤ A quick test of the UDF shows the following output, which looks correct
BTEQ -- Enter your SQL request or BTEQ command:
SELECT MYPROJ.parse_url(
'https://www.samp.com:8080/samp/catg/path?parm1=sampparm'
);*** Query completed. One row found. One column returned.
*** Total elapsed time was 1 second.parse_url('https:www.samp.com:8080/samp/catg/path?parm1=sampparm')--------------------------------------------------------------------
{"scheme": "https", "host": "www.samp.com", "port": "8080", "path": "/samp/catg/path", "query": "parm1=sampparm"}
Custom Shared Library Search Path/Directory
We’ve seen from the above that when executing the UDF CREATE/REPLACE FUNCTION, the only path searched for libraries is /usr/tdbms/lib
.
Let’s say that we needed to include directory /usr/local/lib
added to the search path, and we’ve moved libboost.so
to this location.
In order to configure inclusion of this new search path, we need to edit the Teradata configuration settings that are specific to UDFs. This is done via utility cufconfig
.
To view the existing config settings, we can run (as root):
TDExpress1620_Sles11:~ # cufconfig -o
According to the documentation, the field that allows us to add library search paths is UDFLibPath
UDFLibPath
Purpose
Specifies a library path in addition to the standard library path for use by UDFs, external stored procedures, or UDMs that specify a data access clause of NO SQL in the CREATE PROCEDURE or REPLACE PROCEDURE statement.
Default
This field has no default value.
We need this value to be set to /usr/local/lib
, i.e
UDFLibPath: /usr/local/lib
➤ Create a new text file (/root/cufedit.txt
)
➤ Add the following to the new file and save
UDFLibPath: /usr/local/lib
➤ Run the following command as root
cufconfig -f /root/cufedit.txt
➤ Check to ensure the UDFLibPath
has been updated
cufconfig -o | grep UDFLibPathOutput:UDFLibPath: /usr/local/lib/
➤ Set the correct permissions for the shared library
cd /usr/local/lib
chmod 555 libboost.so
chown teradata libboost.so
chgrp tdtrusted libboost.so
➤ Run the CREATE/REPLACE Function via BTEQ once again we can now see that the additional library search path (/usr/local/lib
) appears in the output, with the function being successfully created
/usr/bin/gcc -D_REENTRANT -D_LIBC_REENTRANT -I/usr/tdbms/etc -L /usr/local/lib -L/usr/tdbms/lib......
➤ Quick test to validate shows the following output
BTEQ — Enter your SQL request or BTEQ command:
SELECT
MYPROJ.parse_url
(
'https://www.samp.com:8080/samp/catg/path?parm1=sampparm'
);*** Query completed. One row found. One column returned.
*** Total elapsed time was 1 second.{"scheme": "https", "host": "www.samp.com", "port": "8080", "path": "/samp/catg/path", "query": "parm1=sampparm"}
Closing Remarks
With the UDF now in place. More testing can be performed to ensure all works as expected.
Some points to note:
- The UDF was created with a PARAMETER STYLE of TD_GENERAL. This parameter style does not account for handling of NULLS
- The process was demonstrated on a Single Node system. For an MPP (multi-node) system, all nodes in the configuration need to have the respective settings applied
- Once the UDF is successfully tested, it can be run in Non-Protected mode to allow for enhanced performance and reduction in system resource usage:
ALTER FUNCTION MYPROJ.parse_url EXECUTE NOT PROTECTED;
Appendix/References
UDFs that Do Not Perform I/O and Do Not Implement UDT Functionality¹
Header Files²
Teradata provides the sqltypes_td.h
header file that you include in your source code file. The header file defines the equivalent C data types for all database data types that you can use for the input arguments and result of your UDFs. Every SQL data type corresponds to a C data type in sqltypes_td.h.
The header file is in the etc directory of the Teradata software distribution:
/usr/tdbms/etc
SQL_TEXT
is used for specific UDF arguments when the UDF uses parameter style SQL. For details on parameter styles, see Scalar Function Parameter List, Aggregate Function Parameter List, and Table Function Parameter List.
Teradata Database remembers the character set the UDF was created under so that the UDF can translate SQL_TEXT input arguments to the expected text and translate text to SQL_TEXT output arguments, no matter who calls the UDF and what the current server character set is for the session.
In accordance with Teradata internationalization plans, KANJI1 support is deprecated and is to be discontinued in the near future. KANJI1 is not allowed as a default character set; the system changes the KANJI1 default character set to the UNICODE character set. Creation of new KANJI1 objects is highly restricted. Although many KANJI1 queries and applications may continue to operate, sites using KANJI1 should convert to another character set as soon as possible
SQL_TEXT Definition³
Before you include the sqltypes_td.h
header file, you must define the SQL_TEXT
constant. The value that you use must match the current server character set of the session in which you use the CREATE FUNCTION statement to create the UDF.
The following example shows how to define SQL_TEXT
and include the sqltypes_td.h
header file in the file that defines a UDF:
#define SQL_TEXT Latin_Text
#include "sqltypes_td.h"
C/C++ Function Name⁴
The name of the function in the C or C++ source code follows the C function naming conventions, with the additional restriction that the name cannot be longer than 30 characters.
In C++, the function definition must be preceded by extern “C”
so that the function name is not converted to a C++ overloaded name. For example:
extern "C"
void f1( char *a, char *result, char sqlstate[6])
{
...
}
When you use the CREATE FUNCTION or REPLACE FUNCTION statement to install the UDF, you specify the name of the C or C++ function. For more information, see Specifying the C/C++ Function Name.
The list of parameters for a UDF is very specific. It includes the input parameters that Teradata Database passes when the UDF appears in a DML statement. It also includes output parameters that return the result of the UDF and the SQLSTATE result code.
A UDF can have 0 to 128 input parameters. The types that you use as input parameters and return type of a UDF correspond to SQL data types in the function call. For details, see SQL Data Type Mapping.
Additional parameters might be required, depending on the parameter passing convention you choose.
Parameters are almost always specified as pointers to the data.
Function Parameters⁵
Parameter Passing Convention:
Scalar Function Parameter List⁶
The parameter list must correspond to the parameter style in the CREATE FUNCTION statement for the UDF.
Use parameter style SQL to allow a UDF to pass nulls for the result or for input arguments; otherwise, use parameter style TD_GENERAL.
When you use parameter style SQL, you must define indicator parameters for the result and for each parameter.
Syntax for Scalar Function Parameter Style TD_GENERAL:
void function_name ( type_1 *input_parameter_1,
...,
type_n *input_parameter_n,
result_type *result,
char sqlstate[6] )
{
...
}
Syntax for Scalar Function Parameter Style SQL:
void function_name ( type_1 *input_parameter_1,
...,
type_n *input_parameter_n,
result_type *result,
int *indicator_parameter_1,
...,
int *indicator_parameter_n,
int *indicator_result,
char sqlstate[6],
SQL_TEXT function_name[m],
SQL_TEXT specific_function_name[l],
SQL_TEXT error_message[p] )
{
...
}
Create Function external_string_literal⁷
A string that specifies the location of source and object components needed to build the external function.
For more information, see Teradata Vantage™ SQL Data Definition Language Detailed Topics , B035–1184 .
Initial Code
Single character code. Depending on the initial code in the sequence, the string specifies either the external function object name for the UDF or an encoded name or path for the components needed to create the external function.
D Enables symbolic debugging for the UDF, which shows source code and displays variables by name. Without this option, UDFs can only be debugged at the machine instruction level. You should always specify this option for debugging purposes when UDFs are being tested. This option adds -g to the C compiler command line. See SET SESSION DEBUG FUNCTION and “C/C++ Command-line Debugging for UDFs” in Teradata Vantage™ SQL External Routine Programming , B035–1147 . The D option applies only to C and C++ code, not to Java UDFs.
You should not use this option when installing debugged UDFs on production system because this option increases the size of the UDF library.
F Function object. The string that follows is the entry point name of the C or C++ external function object.
F!function_entry_point_name
C Client. The source or object code for the external function is stored on the client.
S Server. The source or object code for the external function is stored on the server.
Path Specification
Following is a list of the path specifications for the external function. You can repeat options as necessary with the exception of the package option. You can specify the following file types as external string literals.
I Include file (.h).
I!name_on_server!include_name
L Library name for a nonstandard library files needed by the UDF.
L!library_name
O Object file.
O!name_on_server!object_name
P Package name. You cannot use the package option with any other options except F, the C function name option.
P!package_name
S Source file.
S!name_on_server!source_name
NS No source file. Source files and include files are not stored in the function table. This option only affects how source code is processed in the creation of a new function and applies to all source code specified in the external string literal.
NS!source_file!include_file
delimiter
Specify a delimiter character, such as !
You must use the same delimiter throughout the string specification. name_on_server Name the file on the server. Include files must have the same name specified in the include statement in the C source, without the extension. file_pathname Location (path) and name of the source, include file, object, or library. Because packages and libraries must be preinstalled, you must use the server option (S). Path specifications can use forward slashes (/) or backward slashes (\) regardless of whether the function is being created on a Unix or Windows platform.
References
[1–6]: Teradata Vantage™ 16.20 SQL External Routine Programming. ( March 2019). C/C++ User-Defined Functions
[7]: Teradata Vantage™ SQL Data Definition Language Syntax and Examples. (March 2019). EXTERNAL NAME external_string_literal