Parsing HTTP User-Agents with Teradata

Parsing User-Agents with Teradata User-Installed-Files (UIF) and Python has been discussed and demonstrated here.

A slight setback with the UIF approach was that we needed to install an additional Python package (ua-parser) to support the parsing, i.e., the default Python installation on the Teradata Node did not include capabilities for parsing User-Agents. Getting additional Python packages installed onto Teradata Node/s in a “real” Production environment is not something that can be rapidly accomplished. This is due to the fact that DBAs, Teradata Engineers and other implementation procedures would need to be involved, which might make the process fairly lengthy.

This article describes the process for using Teradata Java-User-Defined-Functions (JUDF) for implementing User-Agent parsing capabilities.

Pre-requisites

For this article, a Java IDE will not be used.

Project Folder

Install and Configure Java Development Kit (JDK)

JDK for my setup is installed into target directory:

C:\Program Files\Java\jdk1.8.0_231

After the installation is complete, add the following to your System Environment Variables:

Include the following to your System Path Environment variable:

%JAVA_HOME%\bin

Download ua-parser Jar and Dependencies

uap-java-1.4.0.jar:
https://repo1.maven.org/maven2/com/github/ua-parser/uap-java/1.4.0/uap-java-1.4.0.jar

gson-2.8.6.jar:
https://repo1.maven.org/maven2/com/google/code/gson/gson/2.8.6/gson-2.8.6.jar

snakeyaml-1.20.jar
https://repo1.maven.org/maven2/org/yaml/snakeyaml/1.20/snakeyaml-1.20.jar

Create Teradata Database for the JUDF components

CREATE DATABASE MYPROJ FROM DBC AS PERM=200E6;

Grant the following access to the user that will be installing the UDF:

GRANT EXECUTE FUNCTION ON MYPROJ TO <username>;
GRANT CREATE FUNCTION ON MYPROJ TO <username>;
GRANT EXECUTE PROCEDURE ON SQLJ TO <username>;
GRANT CREATE EXTERNAL PROCEDURE ON MYPROJ TO <username>;

JAVA UDF Source and Packaging

Code the UDF Class and Method

import ua_parser.Parser;
import ua_parser.Client;
import java.io.IOException;import com.google.gson.Gson;public class parseUA {public static String uaJson (String raw_ua) throws IOException {

Parser newParser = new Parser();
Client agent = newParser.parse(raw_ua.trim());
Gson gson = new Gson();
String returnValue = gson.toJson(agent);
if (raw_ua.trim().isEmpty()) {
returnValue = "";
}

return returnValue;
}

}

Compile Java Source Code

PS C:\Apps\Judf> javac -classpath `
";.\gson-2.8.6.jar;.\snakeyaml-1.20.jar;.\uap-java-1.4.0.jar" `
parseUA.java

Package Java UDF Class into .JAR

PS C:\Apps\Judf> jar.exe -cf ua-udf.jar parseUA.class

Deploy JAR Files to Teradata

Install JARs

DATABASE MYPROJ;
call sqlj.install_jar('cj!gson-2.8.6.jar','gson-2.8.6',0);
call sqlj.install_jar('cj!snakeyaml-1.20.jar','snakeyaml-1.20',0);
call sqlj.install_jar('cj!uap-java-1.4.0.jar','uap-java-1.4.0',0);
call sqlj.install_jar('cj!ua-udf.jar','ua-udf',0);

Use BTEQ to run the above, and ensure that BTEQ is invoked from the path where your project resides.

Output of a sample run looks like this:

C:\Apps\Judf>bteqEnter your logon or BTEQ command:
.logon localtd/<your_username>;
Password:<your_password>
DATABASE MYPROJ;*** New default database accepted.
*** Total elapsed time was 1 second.

call sqlj.install_jar('cj!gson-2.8.6.jar','gson-2.8.6',0);
*** Procedure has been executed.

call sqlj.install_jar('cj!snakeyaml-1.20.jar','snakeyaml-1.20',0);
*** Procedure has been executed.call sqlj.install_jar('cj!uap-java-1.4.0.jar','uap-java-1.4.0',0);*** Procedure has been executed.

call sqlj.install_jar('cj!ua-udf.jar','ua-udf',0);
*** Procedure has been executed.

Set JAR Associations

Execute this procedure as follows (via BTEQ):

DATABASE MYPROJ;CALL SQLJ.ALTER_JAVA_PATH('ua-udf'
,'(*,SNAKEYAML-1.20)(*,GSON-2.8.6)(*,UAP-JAVA-1.4.0)');

The parameters for the second statement above can be explained as follows:

'ua-udf'
This is the ID of the jar file we created (ua-udf.jar) that contains the compiled parseUA.class

'(*,SNAKEYAML-1.20)(*,GSON-2.8.6)(*,UAP-JAVA-1.4.0)')
This line lists the JARs and classes that will be searched when the UDF is invoked. For example, the value of (*,SNAKEYAML-1.20) interprets to, search all classes (*) in JAR file snakeyaml-1.20.jar.

After executing the statements, you can check if the associations have been made using the following SQL:

select distinct b.databasename,
a.JarName,
a.PathJarName
from DBC.JAR_JAR_USAGE a , DBC.DBASE b ,DBC.TVM c
where a.DatabaseId =b.DatabaseId
and b.DatabaseId=c.DatabaseId
and c.TableKind='D'
and b.DatabaseNameI='MYPROJ'
;

If you’re at a site where access to dbc dictionary tables is restricted, then you can use the JAR-specific metadata objects in SQLJ :

DATABASE MYPROJ;sel distinct database,a.jarname,b.pathjarname
from SQLJ.JARS a
join SQLJ.JAR_JAR_USAGE b
on a.databaseid=b.databaseid
and a.jarname=b.jarname
;

Install Java UDF to Target Database

DATABASE MYPROJ;REPLACE FUNCTION MYPROJ.UAJSON 
(raw_ua VARCHAR(1024) CHARACTER SET UNICODE)
RETURNS VARCHAR(2000) CHARACTER SET UNICODE
SPECIFIC uajson
LANGUAGE JAVA
NO SQL
NO EXTERNAL DATA
PARAMETER STYLE JAVA
RETURNS NULL ON NULL INPUT
EXTERNAL NAME 'ua-udf:parseUA.uaJson'
;

Test UDF

To test, the following User-Agent can be used:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36

Run the UDF as follows using this Agent string:

SEL 
MYPROJ.UAJSON('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36');

The output is a single JSON field, which looks like the following after neatly formatting using a JSON formatter:

{
"userAgent": {
"family": "Chrome",
"major": "48",
"minor": "0",
"patch": "2564"
},
"os": {
"family": "Mac OS X",
"major": "10",
"minor": "10",
"patch": "5"
},
"device": {
"family": "Other"
}
}

To extract components/fields from the new JSON version of the User-Agent, Teradata JSON functions can be used, for example, to extract the User-Agent “family”, you could run the following:

SEL 
cast(
MYPROJ.UAJSON(
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36'
)
AS JSON
).JsonExtractValue('$.userAgent.family')(VARCHAR(50))
AS User_Agent_Family

Uninstall/Remove Jars & UDF

DATABASE MYPROJ;DROP FUNCTION MYPROJ.UAJSON;call sqlj.alter_java_path('ua-udf','')call sqlj.remove_jar('ua-udf',0)
call sqlj.remove_jar('gson-2.8.6',0);
call sqlj.remove_jar('snakeyaml-1.20',0);
call sqlj.remove_jar('uap-java-1.4.0',0);

Closing Points

One important point to note is, unlike UDFs written in C, UDFs written in Java can only run in protected mode. If you plan on implementing UDFs using JAVA, performance metrics should be taken into consideration and assessed during testing.

If you are interested in developing Java UDFs using Eclipse, then you may want to check out the Teradata Plugin for Eclipse.

Learner. Interests include Cloud and Devops technologies.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store