Airflow: Connect to Teradata using JDBC
Getting these 3 muppets to communicate was a bit of challenge. It’s been well over a year since I did a proof-of-concept for getting airflow up and running in docker. I knew nothing back then about docker and airflow but eventually it all ran well. At the time, I leveraged off an existing airflow Dockerfile, which I came across in github.
I didn’t get a chance at the time to get the environment working with the Teradata JDBC driver but some things are better late than never. This post describes my recent experiment to get it (mostly) working.
Back to the Famous Airflow Git Repo
To get a base airflow docker running, I went back to the puckel/docker-airflow repo and sure enough, it’s still there.
Airflow can be run in a couple of different modes with some allowing for architecture/system scaling. My focus for now is to get up and running so an airflow instance using LocalExecutor mode will suffice.
First step is to clone the git repo (latest version as of now is: https://github.com/puckel/docker-airflow/releases/tag/1.10.6-1) using:
PS C:\> git clone https://github.com/puckel/docker-airflow.gitCloning into 'docker-airflow'...
remote: Enumerating objects: 785, done.
Receiving objects: 92% (723/785), 10 (delta 0), pack-reused 785 Receiving objects: 91% (715/785), 196.01 KiB | 330.00 KiB/s
Receiving objects: 100% (785/785), 209.77 KiB | 355.00 KiB/s, done.
Resolving deltas: 100% (420/420), done.
Before we can start the build of the docker image with support for Teradata/JBDC connectivity, we’ll need to make some changes to the existingDockerfile
and download the Teradata JDBC driver.
Download Teradata JDBC Driver
The latest driver at this point in time is 16.20.00.13 and can be downloaded from here.
Download the zip file and extract it to the root location of your cloned repo. In my case, that location is C:\docker-airflow
.

Updating the Dockerfile
The existing Dockerfile already includes a step for jdbc airflow hooks, so you’ll see something similar to the below in the file.
airflow[crypto,celery,postgres,hive,jdbc,.....
To include the OpenJDK & Teradata driver into the final docker image, we need to add some steps into the Dockerfile.
Open the Dockerfile and include the following just before the line:COPY script/entrypoint.sh /entrypoint.sh
C:\docker-airflow\Dockerfile
#Install JAVA & Teradata JDBC DriverRUN mkdir -p /usr/share/man/man1
RUN mkdir -p /etc/ssl/certs
RUN mkdir -p /usr/local/airflow/jars
COPY terajdbc4.jar /usr/local/airflow/jars/terajdbc4.jarRUN apt-get update -y && \
apt-get install -y openjdk-8-jdk && \
apt-get autoclean && \
apt autoremoveRUN apt-get update -y && \
apt-get install -y ca-certificates-java && \
update-ca-certificates -f && \
apt-get -y autoclean && \
apt autoremoveENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/
ENV CLASSPATH /usr/local/airflow/jars/terajdbc4.jar
Save changes once done.
Building the Airflow Docker Image
Build the image using the following command:
PS C:\docker-airflow> docker build --rm -t puckel/docker-airflow:latest .
Once the build complete, you should see the following message:
Successfully tagged puckel/docker-airflow:latest
Edit Docker-Compose .yml
The docker compose file for LocalExecutor mode can be found at the root of the cloned repo.
PS C:\docker-airflow> dir docker-compose-LocalExecutor.ymlDirectory: C:\docker-airflow-a — — 28/11/2019 3:53 PM 839 docker-compose-LocalExecutor.yml
Edit the file by replacing the following :
image: puckel/docker-airflow:1.10.6
with:
image: puckel/docker-airflow:latest
Run Docker-Compose YML
Now it’s time to bring up the image, with the other associated services, and check if it all works.
Run the following to bring up containers:
PS C:\docker-airflow> docker-compose -f docker-compose-LocalExecutor.yml up -d
The docker containers should start and you should get output similar to the following:
Creating network “docker-airflow_default” with the default driver
Creating docker-airflow_postgres_1 … done
Creating docker-airflow_webserver_1 … done
To verify that the containers are up and running, logon to the airflow GUI at

Test Teradata JDBC Connectivity
A quick way of verifying the connectivity to Teradata via JDBC is to use an interactive shell into the airflow container and run some simple code.
Start by finding out the container id that’s hosting airflow by running:
PS C:\docker-airflow> docker psCONTAINER ID IMAGE
b556797630c5 puckel/docker-airflow:latest
Now logon to the container with an interactive bash shell and run the code as shown. Be sure to replace the respective values <teradata_server>, <user>, <pwd>
to reflect your environment:
PS C:\docker-airflow> docker exec -ti 0b385c800504 bash airflow@b556797630c5:~$ python
Python 3.7.5 (default, Nov 23 2019, 06:31:35)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.import jaydebeapi#teradata server,
var_tdpid = '<teradata_server>'#Teradata Credentials
var_userid = '<user>'
var_password = '<pwd>'#Class and driver location
td_jars = ['/usr/local/airflow/jars/terajdbc4.jar']
tdat_class='com.teradata.jdbc.TeraDriver'#Jdbc Connection string
url='jdbc:teradata://'+var_tdpid+'/'driver_args=[var_userid,var_password]con = jaydebeapi.connect(jclassname=tdat_class \
,url=url, driver_args=driver_args, \
jars=td_jars)
cur = con.cursor ()
cur.execute ("SELECT date")
rows = cur.fetchall()
print(rows)
cur.close()
con.close()
If we run the above code, we get an error trace that includes the following output:
/usr/local/lib/python3.7/site-packages/jpype/_core.py:210: UserWarning:
Deprecated: convertStrings was not specified when starting the JVM. The default
behavior in JPype will be False starting in JPype 0.8. The recommended setting
for new code is convertStrings=False. The legacy value of True was assumed for
this session. If you are a user of an application that reported this warning,
please file a ticket with the developer.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — “””)
Traceback (most recent call last):
File “<stdin>”, line 3, in <module>
File “/usr/local/lib/python3.7/site-packages/jaydebeapi/__init__.py”, line 381, in connect
jconn = _jdbc_connect(jclassname, url, driver_args, jars, libs)
File “/usr/local/lib/python3.7/site-packages/jaydebeapi/__init__.py”, line 183, in _jdbc_connect_jpype
types_map[i.getName()] = i.getStaticAttribute()
AttributeError: ‘_jpype.PyJPField’ object has no attribute ‘getStaticAttribute’
>>> cur = con.cursor ()
Let’s check which versions of the python packages (jaydebeapi & JPype) have been installed in the docker image. JPype1 is the package that’s throwing the error.
airflow@b556797630c5:~$ pip show JPype1
Name: JPype1
Version: 0.7.0
Summary: A Python to Java bridge.
Home-page: https://github.com/jpype-project/jpype
Author: Steve Menard
Author-email: devilwolf@users.sourceforge.net
License: License :: OSI Approved :: Apache Software License
Location: /usr/local/lib/python3.7/site-packages
Requires:
Required-by: JayDeBeApi
airflow@b556797630c5:~$ pip show jaydebeapi
Name: JayDeBeApi
Version: 1.1.1
Summary: Use JDBC database drivers from Python 2/3 or Jython with a DB-API.
Home-page: https://github.com/baztian/jaydebeapi
Author: Bastian Bowe
Author-email: bastian.dev@gmail.com
License: GNU LGPL
Location: /usr/local/lib/python3.7/site-packages
Requires: JPype1
Required-by:
So, we have:JPype1 version 0.7.0
JayDeBeApiversion 1.1.1
Let’s take a quick look at the official package details for JPype1
at the python package index:
The “Release History” shows that the latest version (0.7.0) was released on June 25th 2019, which is the same we have installed in our docker image (0.7.0). The version released prior to this was (0.6.3) back in Apr 2018, i.e. over 1 year prior to the latest update we’re running:

If we do the same investigation for jaydebeapi
, we see that there hasn’t been any recent updates (nothing since Mar 22, 2017) and that the most current version remains at 1.1.1.

So, I’m going to take a wild turn for this experiment and downgrade JPype1
from 0.7.0
to 0.6.3
. I won’t mess with rebuilding the docker image until the downgrade proves to resolve the issue.
To perform the downgrade, I’ll use an interactive shell, but will logon to the container as root this time.
Downgrade by logging on as root
and using pip
from within the container:
S C:\docker-airflow> docker exec -ti --user root b556797630c5 bashroot@b556797630c5:/usr/local/airflow#
pip install --upgrade JPype1==0.6.3..............
Installing collected packages: JPype1
Found existing installation: JPype1 0.7.0
Uninstalling JPype1-0.7.0:
Successfully uninstalled JPype1-0.7.0
Successfully installed JPype1-0.6.3
Now let’s execute the same python code that failed earlier on.
root@b556797630c5:/usr/local/airflow# python
Python 3.7.5 (default, Nov 23 2019, 06:31:35)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import jaydebeapi
>>>
>>> #teradata server,
... var_tdpid = '<teradata_server>'
>>>
>>> #Teradata Credentials
... var_userid = '<user>'
>>> var_password = '<pwd>'
>>>
>>> #Class and driver location
... td_jars = ['/usr/local/airflow/jars/terajdbc4.jar']
>>> tdat_class='com.teradata.jdbc.TeraDriver'
>>>
>>> #Jdbc Connection string
... url='jdbc:teradata://'+var_tdpid+'/'
>>>
>>> driver_args=[var_userid,var_password]
>>>
>>>
>>> con = jaydebeapi.connect(jclassname=tdat_class \
... ,url=url, driver_args=driver_args, \
... jars=td_jars)
>>> cur = con.cursor ()
>>> cur.execute ("SELECT date")
>>> rows = cur.fetchall()
>>> print(rows)
[('2019-11-28',)]
>>> cur.close()
>>> con.close()
This time we get back a response without the error trace, which is a good sign that it all works.
Teradata JDBC Connection via Airflow Web GUI
Back to the airflow Web Gui to do some further testing.

Substitute the above to match the environment your working with.
Now select “Data Profiling -> Ad Hoc Query”

Select the connection that we just created.

Enter a simple query as shown below: select * from dbc.dbcinfo
and hit Run

We get an error an error returned:
Execution failed on sql ‘ SELECT * FROM ( select from dbc.dbcinfo ) qry LIMIT 100000 ‘: java.sql.SQLException: [Teradata Database] [TeraJDBC 16.20.00.13] [Error 3706] [SQLState 42000] Syntax error: expected something between the ‘select’ keyword and the ‘from’ keyword.
This error seems to be caused by either a config issue or a bug, but after tweaking the sql, the workaround to get it working was to enclose the query in brackets, ()
as follows:
(select * from dbc.dbcinfo)
Let’s try again:

This time, we’re successful.
What’s Next?
You might be interested in referencing the Teradata connection that was created using the Web Gui via the use of environment variables. A good description of how to go about this can be found in the doco.
Another good test to carry out would be to create a DAG that makes use of the JDBC connection to ensure it all works fine.