Teradata User Installed File (UIF)

Teradata’s SCRIPT Table Operator allows us to execute a custom user script/command, which can be installed onto the database as a User-Installed-File (UIF).

At a high level, the Operator works by sending data from sql “select” statements to the script for processing. The script then reads and processes data from stdin and returns its output to stdout (answerset).

In order to demonstrate how it all works, I’ll go through the process of downloading a demo Teradata Virtual Machine instance which, will be used to do the discovery and run some sample code.

VMPlayer/VMWare Workstation

Teradata Express for VMWare Player

As of now, the latest version is 16.20. You’ll need to signup/register before being able to access the link. Registration is free.

Download the .7z archive, which contains the VMWare image of the SUSE Linux OS with Teradata pre-installed.

Save the file to a local folder and use 7zip to extract the contents. You should have the following files in the extracted folder:

PS S:\Virtual_Machines\Teradata\TDExpress16.20_Sles11>Name
--------------------------------------------------
TDExpress16.20_Sles11-disk1.vmdk
TDExpress16.20_Sles11-disk2.vmdk
TDExpress16.20_Sles11-disk3.vmdk
TDExpress16.20_Sles11.vmx

Double click the file ending .vmx file , which contains the config for the Teradata VM.

Secure Network Access to the Teradata VM

This can be accomplished by selecting the VM from the list:

and then:

  • Player -> Manage -> Virtual Machine Settings
  • “Hardware”
  • “Network Adaptor”
  • Host-only: A private network shared with the host
  • Click on OK to save changes

Start Teradata VM and Test Connectivity

Once the logon dialog appears, login with username: root and password:root

After successfully logging in, you should be at the VM’s workspace screen.

Some basic checks before proceeding. We’ll check if we can “ping” the guest VM from the host.

Open up a “Terminal” within the guest, and check the IP address by issuing an ifconfig.

The above shows that the VM has been assigned an IP address of 192.168.96.129 (your IP may be different).

Ping the the above IP address from the host. I’m running a Windows 10 OS, so from Powershell, a ping returns the following:

This test indicates that the guest/host network active. For the examples that follow, you can either choose to connect to the guest from your host using the IP address identified by the ifconfig above, or via commands issued directly from within the VM.

To check if Teradata has launched at startup, from a Terminal on the VM, enter the command pdestate:

Teradata should be up and running, it can take a few minutes for Teradata PDE/RDBMS to launch at startup.

With Teradata up and running, let’s run a test query via bteq to check connectivity to the RDMS. A simple “select date” query will do.

TDExpress1620_Sles11:~ # bteq
Teradata BTEQ 16.20.00.04 for LINUX. PID: 25476
Copyright 1984–2018, Teradata Corporation. ALL RIGHTS RESERVED.
Enter your logon or BTEQ command:
Password:dbc*** Logon successfully completed.
*** Teradata Database Release is 16.20.32.01
*** Teradata Database Version is 16.20.32.01
*** Transaction Semantics are BTET.
*** Session Character Set Name is ‘ASCII’.
*** Total elapsed time was 1 second.
BTEQ — Enter your SQL request or BTEQ command:select date*** Query completed. One row found. One column returned.** Total elapsed time was 1 second.

TERADATA SCRIPT Table Operator

Below are some basic examples of running some basic commands.

Example 1: Determine Linux User Executing Script

SELECT DISTINCT *SCRIPT_COMMAND('whoami')RETURNS ('stdout VARCHAR(50)');*** Query completed. One row found. One column returned.
*** Total elapsed time was 3 seconds.
stdout
--------------------------------------------------
tdatuser

The result shows that the Linux user is tdatuser. This is consistent with the documentation:

"The text included in the SCRIPT_COMMAND clause runs in its own shell inside a temporary working directory.........installed script files run in protected mode (outside of the Teradata Database). The scripts are run as Tdatuser."

Example 2: List Python Version Running SMP/TPA Node

SELECT DISTINCT *FROM SCRIPT (SCRIPT_COMMAND('echo "import sys
print(sys.version)" | /usr/bin/python'
RETURNS (‘stdout VARCHAR(100)’);*** Query completed. One row found. One column returned.
*** Total elapsed time was 4 seconds.
stdout
--------------------------------------------------
(2, 6, 9, 'final', 0)

The version of python currently installed is, 2.6.9, which is fairly outdated.

TERADATA SCRIPT Operator and Python

The best way of demonstrating how it all works is best illustrated with a worked solution, which is described next.

Creating a Teradata SHA256 Python UIF

Create Teradata Database Sandpit

CREATE DATABASE MYPROJ FROM DBC AS PERM=2E9;

The access rights required are as follows:

--This grants all rights on the sandpit.  In a Prod environment, --grants would not be this openGRANT ALL ON MYPROJ TO DBC;--Required for UIF Maintenance/InstallationGRANT CREATE EXTERNAL PROCEDURE ON MYPROJ TO DBC;
GRANT EXECUTE FUNCTION on TD_SYSFNLIB.SCRIPT TO DBC;

Create Sample Data

CREATE SET TABLE MYPROJ.EMAILS,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL
EMAIL_ADDR VARCHAR(255) CHARACTER SET LATIN NOT CASESPECIFICPRIMARY INDEX ( EMAIL_ADDR );

Populate the table with some fake emails.

INSERT INTO MYPROJ.EMAILS VALUES ('myemail@test.com') ;
INSERT INTO MYPROJ.EMAILS VALUES ('anotheremail@test.com') ;

Write and Test Python Script

import hashlibdef sha256_hash(raw_data):
data = []
for rec in raw_data:
rec = rec.strip()
if rec:
data.append("\t".join([rec,hashlib.sha256(rec).hexdigest()]))
return data
if __name__ == "__main__":
emails = ["myemail@test.com", "anotheremail@test.com"]
print("\n".join(sha256_hash(emails)))

The code generates the following hashes:

myemail@test.com f1614ba6bcc183e5836483c0c48d1bf932ab1fe338d0ef097d18555a3a2614bdanotheremail@test.com   c23cd2934c7912140abf66f3f2d6e0f7cb7c17de14393a0daa5d9fe618a1714d

To ensure the hashes are correct, choose one email (myemail@test.com) and test its sha256 hash at https://asecuritysite.com/encryption/md5 :

The sha256 hash matches the output of the Python script, which indicates the script works as expected.

Prepare Python Script for Installation as Teradata UIF

#!/usr/bin/python
import sys
import hashlib
for rec in sys.stdin:
rec = rec.strip().split("\t")
if rec:
print("\t".join([rec[0],hashlib.sha256(rec[0]).hexdigest()]))

The script sets the environment to “/usr/bin/python”, reads rows (with tab delimited fields) from sysin, and then returns the output to stdout.

We can now start working on getting the script installed onto the database

Save the script locally. I’ve saved to directory/root/Desktop/uif_scripts/sha256.py within the guest.

Installing the Python Script as a UIF

.logon dbc;
<enter password>
DATABASE MYPROJ;
SET SESSION SEARCHUIFDBPATH = MYPROJ;
CALL SYSUIF.INSTALL_FILE('sha256','sha256.py','cz!sha256.py');
HELP DATABASE MYPROJ;

The output of the run should look similar to the following:

TDExpress1620_Sles11:/ # cd /root/Desktop/uif_scriptsTDExpress1620_Sles11:~/Desktop/uif_scripts # bteqTeradata BTEQ 16.20.00.04 for LINUX. PID: 12730Copyright 1984-2018, Teradata Corporation. ALL RIGHTS RESERVED.
Enter your logon or BTEQ command:
.logon dbc
Password:
*** Logon successfully completed.
*** Teradata Database Release is 16.20.32.01
*** Teradata Database Version is 16.20.32.01
*** Transaction Semantics are BTET.
*** Session Character Set Name is 'ASCII'.
*** Total elapsed time was 1 second.
BTEQ -- Enter your SQL request or BTEQ command: DATABASE MYPROJ;*** New default database accepted.
*** Total elapsed time was 1 second.
BTEQ -- Enter your SQL request or BTEQ command: SET SESSION SEARCHUIFDBPATH = MYPROJ;*** Session UIF Search Database Path has been set.
*** Total elapsed time was 1 second.
BTEQ -- Enter your SQL request or BTEQ command: CALL SYSUIF.INSTALL_FILE('sha256','sha256.py','cz!sha256.py');*** Procedure has been executed.
*** Total elapsed time was 1 second.
BTEQ -- Enter your SQL request or BTEQ command: HELP DATABASE MYPROJ;*** Help information returned. 3 rows.
*** Total elapsed time was 1 second.
Table/View/Macro name Kind Comment
------------------------------ ---- -------
EMAILS T ?
sha256 Z ?

Running the UIF: MYPROJ.sha256

SET SESSION SEARCHUIFDBPATH = MYPROJ;

The UIF can now be run against our test table (myproj.emails) using the following code.

SEL 
SCR_OPER.RAW_EMAIL
,SCR_OPER.HASHED_EMAIL
FROM SCRIPT (ON (sel email_addr FROM myproj.emails)
SCRIPT_COMMAND('./MYPROJ/sha256.py')
RETURNS ('RAW_EMAIL VARCHAR(55)' ,' HASHED_EMAIL VARCHAR(100)')
) AS SCR_OPER

The following output is returned:

RAW_EMAIL               HASHED_EMAIL
anotheremail@test.com c23cd2934c7912140abf66f3f2d6e0f7cb7c17de14393a0daa5d9fe618a1714d
myemail@test.com f1614ba6bcc183e5836483c0c48d1bf932ab1fe338d0ef097d18555a3a2614bd

Conclusion and Extended Topics

There are a some other topics that will be covered as an extension to this article. They were not included in this article as they deserve a dedicated discussion.

These topics include:

  • Extending the capabilities of UIFs to harness advanced features of other python packages not included in the default version of python
  • Options and workarounds for having to explicitly set UIF database search path (SEARCHUIFDBPATH)

Learner. Interests include Cloud and Devops technologies.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store