Parsing HTTP User-Agents with Teradata

Parsing User-Agents with Teradata User-Installed-Files (UIF) and Python has been discussed and demonstrated here.

A slight setback with the UIF approach was that we needed to install an additional Python package (ua-parser) to support the parsing, i.e., the default Python installation on the Teradata Node did not include capabilities for parsing User-Agents. Getting additional Python packages installed onto Teradata Node/s in a “real” Production environment is not something that can be rapidly accomplished. This is due to the fact that DBAs, Teradata Engineers and other implementation procedures would need to be involved, which might make the process fairly lengthy.

This article describes the process for using Teradata Java-User-Defined-Functions (JUDF) for implementing User-Agent parsing capabilities.

Pre-requisites

If you’re familiar with Java Development using Maven, then you can go ahead and set yourself up with the dependencies. The pom.xml can be found at https://github.com/ua-parser/uap-java.

For this article, a Java IDE will not be used.

Create a folder to store the UDF components. I’m using C:\Apps\Judf

I’m running JDK version 1.8.0_231 on Windows 10 x64.

JDK for my setup is installed into target directory:

C:\Program Files\Java\jdk1.8.0_231

After the installation is complete, add the following to your System Environment Variables:

Image for post
Image for post

Include the following to your System Path Environment variable:

%JAVA_HOME%\bin

Download the following jars and save them to your project folder.

uap-java-1.4.0.jar:
https://repo1.maven.org/maven2/com/github/ua-parser/uap-java/1.4.0/uap-java-1.4.0.jar

gson-2.8.6.jar:
https://repo1.maven.org/maven2/com/google/code/gson/gson/2.8.6/gson-2.8.6.jar

snakeyaml-1.20.jar
https://repo1.maven.org/maven2/org/yaml/snakeyaml/1.20/snakeyaml-1.20.jar

Create the following database to hold our UDF jars/function:

CREATE DATABASE MYPROJ FROM DBC AS PERM=200E6;

Grant the following access to the user that will be installing the UDF:

GRANT EXECUTE FUNCTION ON MYPROJ TO <username>;
GRANT CREATE FUNCTION ON MYPROJ TO <username>;
GRANT EXECUTE PROCEDURE ON SQLJ TO <username>;
GRANT CREATE EXTERNAL PROCEDURE ON MYPROJ TO <username>;

JAVA UDF Source and Packaging

Create a file in your project folder named parseUA.java and paste in the following code:

import ua_parser.Parser;
import ua_parser.Client;
import java.io.IOException;import com.google.gson.Gson;public class parseUA {public static String uaJson (String raw_ua) throws IOException {

Parser newParser = new Parser();
Client agent = newParser.parse(raw_ua.trim());
Gson gson = new Gson();
String returnValue = gson.toJson(agent);
if (raw_ua.trim().isEmpty()) {
returnValue = "";
}

return returnValue;
}

}

The parseUA.java can be compiled to generate an output .class file using the following:

PS C:\Apps\Judf> javac -classpath `
";.\gson-2.8.6.jar;.\snakeyaml-1.20.jar;.\uap-java-1.4.0.jar" `
parseUA.java

Generate a jar containing the compiled Java Class (parseUA.class) by running the following:

PS C:\Apps\Judf> jar.exe -cf ua-udf.jar parseUA.class

Deploy JAR Files to Teradata

To deploy the JAR files into our target Teradata database (MYPROJ), we will need to use procedure SQLJ.INSTALL_JAR.

The following will install the JARs required for our UDF into database “MYPROJ”:

DATABASE MYPROJ;
call sqlj.install_jar('cj!gson-2.8.6.jar','gson-2.8.6',0);
call sqlj.install_jar('cj!snakeyaml-1.20.jar','snakeyaml-1.20',0);
call sqlj.install_jar('cj!uap-java-1.4.0.jar','uap-java-1.4.0',0);
call sqlj.install_jar('cj!ua-udf.jar','ua-udf',0);

Use BTEQ to run the above, and ensure that BTEQ is invoked from the path where your project resides.

Output of a sample run looks like this:

C:\Apps\Judf>bteqEnter your logon or BTEQ command:
.logon localtd/<your_username>;
Password:<your_password>
DATABASE MYPROJ;*** New default database accepted.
*** Total elapsed time was 1 second.

call sqlj.install_jar('cj!gson-2.8.6.jar','gson-2.8.6',0);
*** Procedure has been executed.

call sqlj.install_jar('cj!snakeyaml-1.20.jar','snakeyaml-1.20',0);
*** Procedure has been executed.call sqlj.install_jar('cj!uap-java-1.4.0.jar','uap-java-1.4.0',0);*** Procedure has been executed.

call sqlj.install_jar('cj!ua-udf.jar','ua-udf',0);
*** Procedure has been executed.

Since the JARs we’ve just deployed are all associated (i.e. our JAVA source code uses classes from all of the installed JARs), we need to ensure that the required classes can be loaded when the UDF is invoked. This is accomplished by executing procedure SQLJ.ALTER_JAVA_PATH.

Execute this procedure as follows (via BTEQ):

DATABASE MYPROJ;CALL SQLJ.ALTER_JAVA_PATH('ua-udf'
,'(*,SNAKEYAML-1.20)(*,GSON-2.8.6)(*,UAP-JAVA-1.4.0)');

The parameters for the second statement above can be explained as follows:

'ua-udf'
This is the ID of the jar file we created (ua-udf.jar) that contains the compiled parseUA.class

'(*,SNAKEYAML-1.20)(*,GSON-2.8.6)(*,UAP-JAVA-1.4.0)')
This line lists the JARs and classes that will be searched when the UDF is invoked. For example, the value of (*,SNAKEYAML-1.20) interprets to, search all classes (*) in JAR file snakeyaml-1.20.jar.

After executing the statements, you can check if the associations have been made using the following SQL:

select distinct b.databasename,
a.JarName,
a.PathJarName
from DBC.JAR_JAR_USAGE a , DBC.DBASE b ,DBC.TVM c
where a.DatabaseId =b.DatabaseId
and b.DatabaseId=c.DatabaseId
and c.TableKind='D'
and b.DatabaseNameI='MYPROJ'
;

If you’re at a site where access to dbc dictionary tables is restricted, then you can use the JAR-specific metadata objects in SQLJ :

DATABASE MYPROJ;sel distinct database,a.jarname,b.pathjarname
from SQLJ.JARS a
join SQLJ.JAR_JAR_USAGE b
on a.databaseid=b.databaseid
and a.jarname=b.jarname
;
Image for post
Image for post

Install Java UDF to Target Database

Our function can be installed via BTEQ using the following definition:

DATABASE MYPROJ;REPLACE FUNCTION MYPROJ.UAJSON 
(raw_ua VARCHAR(1024) CHARACTER SET UNICODE)
RETURNS VARCHAR(2000) CHARACTER SET UNICODE
SPECIFIC uajson
LANGUAGE JAVA
NO SQL
NO EXTERNAL DATA
PARAMETER STYLE JAVA
RETURNS NULL ON NULL INPUT
EXTERNAL NAME 'ua-udf:parseUA.uaJson'
;

Test UDF

The UDF takes an input parameter, User-Agent as a string, and outputs a parse-able JSON version.

To test, the following User-Agent can be used:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36

Run the UDF as follows using this Agent string:

SEL 
MYPROJ.UAJSON('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36');

The output is a single JSON field, which looks like the following after neatly formatting using a JSON formatter:

{
"userAgent": {
"family": "Chrome",
"major": "48",
"minor": "0",
"patch": "2564"
},
"os": {
"family": "Mac OS X",
"major": "10",
"minor": "10",
"patch": "5"
},
"device": {
"family": "Other"
}
}

To extract components/fields from the new JSON version of the User-Agent, Teradata JSON functions can be used, for example, to extract the User-Agent “family”, you could run the following:

SEL 
cast(
MYPROJ.UAJSON(
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36'
)
AS JSON
).JsonExtractValue('$.userAgent.family')(VARCHAR(50))
AS User_Agent_Family
Image for post
Image for post

Uninstall/Remove Jars & UDF

To remove all objects associated with the installation, run the following:

DATABASE MYPROJ;DROP FUNCTION MYPROJ.UAJSON;call sqlj.alter_java_path('ua-udf','')call sqlj.remove_jar('ua-udf',0)
call sqlj.remove_jar('gson-2.8.6',0);
call sqlj.remove_jar('snakeyaml-1.20',0);
call sqlj.remove_jar('uap-java-1.4.0',0);

Closing Points

Hopefully this article has helped in demonstrating the flexibility that Java UDFs offer.

One important point to note is, unlike UDFs written in C, UDFs written in Java can only run in protected mode. If you plan on implementing UDFs using JAVA, performance metrics should be taken into consideration and assessed during testing.

If you are interested in developing Java UDFs using Eclipse, then you may want to check out the Teradata Plugin for Eclipse.

Written by

Primarily a Learner/Coder with interests in Python, Cloud Technologies, Security and Automation. Pandas munching on Bamboo sticks give me the “Giggles” :))

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store