Implementing C/C++ Dynamic Shared Libraries with Teradata UDFs

Having worked with Teradata User Installed Files (UIFs) and Java User Defined Functions (UDFs) in the past, I found the courage to finally attempt working with C/C++ UDFs. To make it an interesting learning experience, I set out to achieve the following:

Build the Boost C++ Regex source into a Dynamic Shared C++ Library (.so) and implement a Scalar UDF that utilises this library to parse URLs.

Teradata REGEXP functions are already available within Teradata so this article is by no means attempting to illustrate a “sledge hammer” approach to a simple problem. The focus is on the specific detail involved in implementing C/C++ libraries using Teradata C/C++ UDFs on a single node (SMP) system.

etting up C/C++ Programming Environment

All activities discussed will be performed on the Teradata VMWare Express Edition 16.20, which is the latest release currently available for download.

Create New Linux User

➤ Logon to the VM using the default user and password of “root”.
➤ Create the following user (which will be used for all activities).

Image for post
Image for post

➤ Append the following entry to the new user’s bashrc profile to allow for a default umask 022 (equates to new files being created with permissions of 755).

/home/tdat/.bashrcumask 022

Download and Configure Eclipse IDE for C/C++

The below version of the Eclipse IDE will be used. Version:

Neon.3 Release (4.6.3)
Build id: 20170314–1500

This version is referred to in the documentation for the “Teradata Eclipse Plugin”. It ran fine on SLES 11 OS, which is the OS hosted on the VM.

The tar.gz can be downloaded from here.

➤ Logon to the VM with the new user created above (tdat)
➤ Create a directory as shown below and save the download
(eclipse-cpp-neon-3-linux-gtk-x86_64.tar.gz) to this location

tdat@TDExpress1620_Sles11:~> mkdir -p /home/tdat/apps

➤ Once download completes, extract the contents of the tar.gz

tdat@TDExpress1620_Sles11:~/apps> tar -xvf eclipse-cpp-neon-3-linux-gtk-x86_64.tar.gz

➤ Append jdk path location to PATH via .bashrc

/home/tdat/.bashrcPATH=/opt/teradata/jvm64/jdk8/bin:${PATH}
export PATH

➤ I needed to update the -Xmx parameter in Eclipse’ eclipse.inifile to avoid memory issues/messages

/home/tdat/apps/eclipse/eclipse.ini-Xmx512m

➤ Eclipse may be launched when required using

/home/tdat/apps/eclipse/eclipse

Download Boost C++ Libraries for Linux/Unix

Download the latest version of the Boost dev source (latest as of now is 1.72.0) and extract to /home/tdat/apps/

tdat@TDExpress1620_Sles11:~/apps> tar -xvf boost_1_72_0.tar.gz

Final parent directory containing the Boost C++ dev hierarchy should be at:

/home/tdat/apps/boost_1_72_0

Compile Boost Regex Source into Dynamic Shared Library (.so)

To compile the regex components into a shared library, proceed as follows.

➤ Change directory to where source is located

cd /home/tdat/apps/boost_1_72_0/libs/regex/src

➤ Compile using g++

g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c c_regex_traits.cppg++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c cpp_regex_traits.cpp            g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c cregex.cpp                      g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c fileiter.cpp                    g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c icu.cpp                         g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c instances.cpp                   g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c internals.hpp                   g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c posix_api.cpp                   g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c regex.cpp                       g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c regex_debug.cpp                 g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c regex_raw_buffer.cpp            g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c regex_traits_defaults.cpp       g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c static_mutex.cpp                g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c usinstances.cpp                 g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c w32_regex_traits.cpp            g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c wc_regex_traits.cpp             g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c wide_posix_api.cpp              g++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -fPIC -c winstances.cpp

➤ Create dynamic shared library, naming the output .so as libboost.so

g++ -shared -fPIC -o libboost.so \
c_regex_traits.o \
cpp_regex_traits.o \
cregex.o \
fileiter.o \
icu.o \
instances.o \
internals.hpp \
posix_api.o \
regex.o \
regex_debug.o \
regex_raw_buffer.o \
regex_traits_defaults.o \
static_mutex.o \
usinstances.o \
w32_regex_traits.o \
wc_regex_traits.o \
wide_posix_api.o \
winstances.o

➤ This outputs the library into current folder

/home/tdat/apps/boost_1_72_0/libs/regex/src/libboost.so

Create UDF Test code using Shared Library

➤ Open eclipse using the following command.

/home/tdat/apps/eclipse/eclipse

➤ You will be asked to select location of your Workspace, where all project artifacts will be stored (/home/tdat/workspace)

➤ Create new project via “File -> C/C++ Project”

Image for post
Image for post

➤ After clicking “Next”, you will have the option to edit “advanced settings” for the project.

Image for post
Image for post

➤ Add the following directory to the “Include paths”

Image for post
Image for post

➤ For the compiler’s “Miscellaneous” settings, tick “Verbose” and “Position Independent Code”

Image for post
Image for post

➤ For the “GCC C++ Linker”

  • for “Libraries”, enter the library name “boost
  • add the directory /home/tdat/apps/boost_1_72_0/libs/regex/src as show below to the “Library search path”
Image for post
Image for post

➤ Create a new source folder

Image for post
Image for post

➤ Create a new source code file named parse_url.cpp in the new src folder.

Image for post
Image for post

➤ Edit parse_url.cpp and add the following code (which was based off http://www.zedwood.com/article/cpp-boost-url-regex):

➤ Build the Project in Eclipse using “Project -> Build”
➤ If there are no errors, execute using “Run → Run” and you should see the sample url parsed and “jsonified” in the Eclipse Console

Image for post
Image for post

When formatted neatly, the Console output appears as follows:

Preparing C++ Code for UDF Format

Comprehensive documentation describing aspects of the development can be found in the Teradata documentation. Relevant excerpts have been included in the Appendix of this article (taken verbatim from the Teradata Documentation).

➤ Create a new file named

/home/tdat/apps/boost_1_72_0/libs/regex/src/parse_url.cpp

and save the UDF-ready code as shown below.

Install parse_url UDF

UDF Object File for UDF

➤ Generate an output object file parse_url.o by running the following

cd /home/tdat/apps/boost_1_72_0/libs/regex/srcg++ -I /home/tdat/apps/boost_1_72_0 -I /home/tdat/apps/boost_1_72_0/libs/regex/src -I /usr/tdbms/etc -fPIC -c parse_url.cpp

Prepare and Execute CREATE/REPLACE FUNCTION for UDF

➤ The following code will be used to install the UDF to database MYPROJ

➤ Launch a BTEQ session and run the REPLACE/CREATE as above

➤ We get an error:

cannot find -lboost
collect2: ld returned 1 exit status

which tells us that the library boost (note the naming convention, -lboost refers to libboost.so) could not be found.

➤ Looking more closely at the error log, we can see that the only directory being searched for libraries (.so) is : /usr/tdbms/lib

It’s becoming obvious that if we copy

/home/tdat/apps/boost_1_72_0/libs/regex/src/libboost.so 

over to

/usr/tdbms/lib 

and apply the correct permissions, ownership and group settings, then we should get it to work.

➤ Perform the following as “root

cd /home/tdat/apps/boost_1_72_0/libs/regex/src
cp libboost.so /usr/tdbms/lib
cd /usr/tdbms/lib
chmod 555 libboost.so
chown teradata libboost.so
chgrp tdtrusted libboost.so

➤ Running the CREATE/REPLACE function via BTEQ once again works without error and creates the function successfully

UDF Preliminary Testing

➤ A quick test of the UDF shows the following output, which looks correct

Custom Shared Library Search Path/Directory

We’ve seen from the above that when executing the UDF CREATE/REPLACE FUNCTION, the only path searched for libraries is /usr/tdbms/lib.

Let’s say that we needed to include directory /usr/local/lib added to the search path, and we’ve moved libboost.so to this location.

In order to configure inclusion of this new search path, we need to edit the Teradata configuration settings that are specific to UDFs. This is done via utility cufconfig.

To view the existing config settings, we can run (as root):

TDExpress1620_Sles11:~ # cufconfig -o

According to the documentation, the field that allows us to add library search paths is UDFLibPath

UDFLibPath

Purpose

Specifies a library path in addition to the standard library path for use by UDFs, external stored procedures, or UDMs that specify a data access clause of NO SQL in the CREATE PROCEDURE or REPLACE PROCEDURE statement.

Default

This field has no default value.

We need this value to be set to /usr/local/lib, i.e

UDFLibPath: /usr/local/lib

➤ Create a new text file (/root/cufedit.txt)
➤ Add the following to the new file and save

UDFLibPath: /usr/local/lib

➤ Run the following command as root

cufconfig -f /root/cufedit.txt

➤ Check to ensure the UDFLibPath has been updated

cufconfig -o | grep UDFLibPathOutput:UDFLibPath: /usr/local/lib/

➤ Set the correct permissions for the shared library

cd /usr/local/lib
chmod 555 libboost.so
chown teradata libboost.so
chgrp tdtrusted libboost.so

➤ Run the CREATE/REPLACE Function via BTEQ once again we can now see that the additional library search path (/usr/local/lib) appears in the output, with the function being successfully created

/usr/bin/gcc -D_REENTRANT -D_LIBC_REENTRANT -I/usr/tdbms/etc -L /usr/local/lib -L/usr/tdbms/lib......

➤ Quick test to validate shows the following output

BTEQ — Enter your SQL request or BTEQ command: 
SELECT
MYPROJ.parse_url
(
'https://www.samp.com:8080/samp/catg/path?parm1=sampparm'
);
*** Query completed. One row found. One column returned.
*** Total elapsed time was 1 second.
{"scheme": "https", "host": "www.samp.com", "port": "8080", "path": "/samp/catg/path", "query": "parm1=sampparm"}

Closing Remarks

With the UDF now in place. More testing can be performed to ensure all works as expected.

Some points to note:

  • The UDF was created with a PARAMETER STYLE of TD_GENERAL. This parameter style does not account for handling of NULLS
  • The process was demonstrated on a Single Node system. For an MPP (multi-node) system, all nodes in the configuration need to have the respective settings applied
  • Once the UDF is successfully tested, it can be run in Non-Protected mode to allow for enhanced performance and reduction in system resource usage:
    ALTER FUNCTION MYPROJ.parse_url EXECUTE NOT PROTECTED;

Appendix/References

UDFs that Do Not Perform I/O and Do Not Implement UDT Functionality¹

Image for post
Image for post

Header Files²

Teradata provides the sqltypes_td.h header file that you include in your source code file. The header file defines the equivalent C data types for all database data types that you can use for the input arguments and result of your UDFs. Every SQL data type corresponds to a C data type in sqltypes_td.h.

The header file is in the etc directory of the Teradata software distribution:

/usr/tdbms/etc

SQL_TEXT is used for specific UDF arguments when the UDF uses parameter style SQL. For details on parameter styles, see Scalar Function Parameter List, Aggregate Function Parameter List, and Table Function Parameter List.

Teradata Database remembers the character set the UDF was created under so that the UDF can translate SQL_TEXT input arguments to the expected text and translate text to SQL_TEXT output arguments, no matter who calls the UDF and what the current server character set is for the session.

In accordance with Teradata internationalization plans, KANJI1 support is deprecated and is to be discontinued in the near future. KANJI1 is not allowed as a default character set; the system changes the KANJI1 default character set to the UNICODE character set. Creation of new KANJI1 objects is highly restricted. Although many KANJI1 queries and applications may continue to operate, sites using KANJI1 should convert to another character set as soon as possible

SQL_TEXT Definition³

Before you include the sqltypes_td.h header file, you must define the SQL_TEXT constant. The value that you use must match the current server character set of the session in which you use the CREATE FUNCTION statement to create the UDF.

Image for post
Image for post

The following example shows how to define SQL_TEXT and include the sqltypes_td.h header file in the file that defines a UDF:

#define  SQL_TEXT Latin_Text
#include "sqltypes_td.h"

C/C++ Function Name⁴

The name of the function in the C or C++ source code follows the C function naming conventions, with the additional restriction that the name cannot be longer than 30 characters.

In C++, the function definition must be preceded by extern “C” so that the function name is not converted to a C++ overloaded name. For example:

extern "C" 
void f1( char *a, char *result, char sqlstate[6])
{
...
}

When you use the CREATE FUNCTION or REPLACE FUNCTION statement to install the UDF, you specify the name of the C or C++ function. For more information, see Specifying the C/C++ Function Name.

The list of parameters for a UDF is very specific. It includes the input parameters that Teradata Database passes when the UDF appears in a DML statement. It also includes output parameters that return the result of the UDF and the SQLSTATE result code.

A UDF can have 0 to 128 input parameters. The types that you use as input parameters and return type of a UDF correspond to SQL data types in the function call. For details, see SQL Data Type Mapping.

Additional parameters might be required, depending on the parameter passing convention you choose.

Parameters are almost always specified as pointers to the data.

Function Parameters⁵

Parameter Passing Convention:

Image for post
Image for post

Scalar Function Parameter List⁶

The parameter list must correspond to the parameter style in the CREATE FUNCTION statement for the UDF.

Use parameter style SQL to allow a UDF to pass nulls for the result or for input arguments; otherwise, use parameter style TD_GENERAL.

When you use parameter style SQL, you must define indicator parameters for the result and for each parameter.

Syntax for Scalar Function Parameter Style TD_GENERAL:

void function_name ( type_1      *input_parameter_1,
...,
type_n *input_parameter_n,
result_type *result,
char sqlstate[6] )
{
...
}

Syntax for Scalar Function Parameter Style SQL:

void function_name ( type_1      *input_parameter_1,
...,
type_n *input_parameter_n,
result_type *result,
int *indicator_parameter_1,
...,
int *indicator_parameter_n,
int *indicator_result,
char sqlstate[6],
SQL_TEXT function_name[m],
SQL_TEXT specific_function_name[l],
SQL_TEXT error_message[p] )
{
...
}
Image for post
Image for post

Create Function external_string_literal⁷

A string that specifies the location of source and object components needed to build the external function.

For more information, see Teradata Vantage™ SQL Data Definition Language Detailed Topics , B035–1184 .

Initial Code

Single character code. Depending on the initial code in the sequence, the string specifies either the external function object name for the UDF or an encoded name or path for the components needed to create the external function.

Enables symbolic debugging for the UDF, which shows source code and displays variables by name. Without this option, UDFs can only be debugged at the machine instruction level. You should always specify this option for debugging purposes when UDFs are being tested. This option adds -g to the C compiler command line. See SET SESSION DEBUG FUNCTION and “C/C++ Command-line Debugging for UDFs” in Teradata Vantage™ SQL External Routine Programming , B035–1147 . The D option applies only to C and C++ code, not to Java UDFs.

You should not use this option when installing debugged UDFs on production system because this option increases the size of the UDF library.

Function object. The string that follows is the entry point name of the C or C++ external function object.

F!function_entry_point_name

Client. The source or object code for the external function is stored on the client.

Server. The source or object code for the external function is stored on the server.

Path Specification

Following is a list of the path specifications for the external function. You can repeat options as necessary with the exception of the package option. You can specify the following file types as external string literals.

Include file (.h).

I!name_on_server!include_name

Library name for a nonstandard library files needed by the UDF.

L!library_name

Object file.

O!name_on_server!object_name

Package name. You cannot use the package option with any other options except F, the C function name option.

P!package_name

Source file.

S!name_on_server!source_name

S No source file. Source files and include files are not stored in the function table. This option only affects how source code is processed in the creation of a new function and applies to all source code specified in the external string literal.

NS!source_file!include_file

delimiter

Specify a delimiter character, such as !

You must use the same delimiter throughout the string specification. name_on_server Name the file on the server. Include files must have the same name specified in the include statement in the C source, without the extension. file_pathname Location (path) and name of the source, include file, object, or library. Because packages and libraries must be preinstalled, you must use the server option (S). Path specifications can use forward slashes (/) or backward slashes (\) regardless of whether the function is being created on a Unix or Windows platform.

References

[1–6]: Teradata Vantage™ 16.20 SQL External Routine Programming. ( March 2019). C/C++ User-Defined Functions

[7]: Teradata Vantage™ SQL Data Definition Language Syntax and Examples. (March 2019). EXTERNAL NAME external_string_literal

Written by

Primarily a Learner/Coder with interests in Python, Cloud Technologies, Security and Automation. Pandas munching on Bamboo sticks give me the “Giggles” :))

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store