Airflow: Connect to Teradata using JDBC

Getting these 3 muppets to communicate was a bit of challenge. It’s been well over a year since I did a proof-of-concept for getting airflow up and running in docker. I knew nothing back then about docker and airflow but eventually it all ran well. At the time, I leveraged off an existing airflow Dockerfile, which I came across in github.

I didn’t get a chance at the time to get the environment working with the Teradata JDBC driver but some things are better late than never. This post describes my recent experiment to get it (mostly) working.

Back to the Famous Airflow Git Repo

To get a base airflow docker running, I went back to the puckel/docker-airflow repo and sure enough, it’s still there.

Airflow can be run in a couple of different modes with some allowing for architecture/system scaling. My focus for now is to get up and running so an airflow instance using LocalExecutor mode will suffice.

First step is to clone the git repo (latest version as of now is: https://github.com/puckel/docker-airflow/releases/tag/1.10.6-1) using:

Before we can start the build of the docker image with support for Teradata/JBDC connectivity, we’ll need to make some changes to the existingDockerfile and download the Teradata JDBC driver.

Download Teradata JDBC Driver

The latest driver at this point in time is 16.20.00.13 and can be downloaded from here.

Download the zip file and extract it to the root location of your cloned repo. In my case, that location is C:\docker-airflow.

Updating the Dockerfile

The existing Dockerfile already includes a step for jdbc airflow hooks, so you’ll see something similar to the below in the file.

To include the OpenJDK & Teradata driver into the final docker image, we need to add some steps into the Dockerfile.

Open the Dockerfile and include the following just before the line:
COPY script/entrypoint.sh /entrypoint.sh

C:\docker-airflow\Dockerfile

#Install JAVA & Teradata JDBC DriverRUN mkdir -p /usr/share/man/man1
RUN mkdir -p /etc/ssl/certs
RUN mkdir -p /usr/local/airflow/jars
COPY terajdbc4.jar /usr/local/airflow/jars/terajdbc4.jar
RUN apt-get update -y && \
apt-get install -y openjdk-8-jdk && \
apt-get autoclean && \
apt autoremove
RUN apt-get update -y && \
apt-get install -y ca-certificates-java && \
update-ca-certificates -f && \
apt-get -y autoclean && \
apt autoremove
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/
ENV CLASSPATH /usr/local/airflow/jars/terajdbc4.jar

Save changes once done.

Building the Airflow Docker Image

Build the image using the following command:

Once the build complete, you should see the following message:

Successfully tagged puckel/docker-airflow:latest

Edit Docker-Compose .yml

The docker compose file for LocalExecutor mode can be found at the root of the cloned repo.

Edit the file by replacing the following :

image: puckel/docker-airflow:1.10.6

with:

image: puckel/docker-airflow:latest

Run Docker-Compose YML

Now it’s time to bring up the image, with the other associated services, and check if it all works.

Run the following to bring up containers:

The docker containers should start and you should get output similar to the following:

To verify that the containers are up and running, logon to the airflow GUI at

http://localhost:8080

Test Teradata JDBC Connectivity

A quick way of verifying the connectivity to Teradata via JDBC is to use an interactive shell into the airflow container and run some simple code.

Start by finding out the container id that’s hosting airflow by running:

Now logon to the container with an interactive bash shell and run the code as shown. Be sure to replace the respective values <teradata_server>, <user>, <pwd> to reflect your environment:

If we run the above code, we get an error trace that includes the following output:

Let’s check which versions of the python packages (jaydebeapi & JPype) have been installed in the docker image. JPype1 is the package that’s throwing the error.

So, we have:
JPype1 version 0.7.0
JayDeBeApiversion 1.1.1

Let’s take a quick look at the official package details for JPype1 at the python package index:

The “Release History” shows that the latest version (0.7.0) was released on June 25th 2019, which is the same we have installed in our docker image (0.7.0). The version released prior to this was (0.6.3) back in Apr 2018, i.e. over 1 year prior to the latest update we’re running:

If we do the same investigation for jaydebeapi, we see that there hasn’t been any recent updates (nothing since Mar 22, 2017) and that the most current version remains at 1.1.1.

So, I’m going to take a wild turn for this experiment and downgrade JPype1 from 0.7.0 to 0.6.3 . I won’t mess with rebuilding the docker image until the downgrade proves to resolve the issue.

To perform the downgrade, I’ll use an interactive shell, but will logon to the container as root this time.

Downgrade by logging on as root and using pip from within the container:

Now let’s execute the same python code that failed earlier on.

This time we get back a response without the error trace, which is a good sign that it all works.

Teradata JDBC Connection via Airflow Web GUI

Back to the airflow Web Gui to do some further testing.

Substitute the above to match the environment your working with.

Now select “Data Profiling -> Ad Hoc Query

Select the connection that we just created.

Enter a simple query as shown below: select * from dbc.dbcinfo and hit Run

We get an error an error returned:

Execution failed on sql ‘ SELECT * FROM ( select from dbc.dbcinfo ) qry LIMIT 100000 ‘: java.sql.SQLException: [Teradata Database] [TeraJDBC 16.20.00.13] [Error 3706] [SQLState 42000] Syntax error: expected something between the ‘select’ keyword and the ‘from’ keyword.

This error seems to be caused by either a config issue or a bug, but after tweaking the sql, the workaround to get it working was to enclose the query in brackets, () as follows:

(select * from dbc.dbcinfo)

Let’s try again:

This time, we’re successful.

What’s Next?

You might be interested in referencing the Teradata connection that was created using the Web Gui via the use of environment variables. A good description of how to go about this can be found in the doco.

Another good test to carry out would be to create a DAG that makes use of the JDBC connection to ensure it all works fine.

Learner. Interests include Cloud and Devops technologies.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store