Airflow: Connect to Teradata using JDBC

Getting these 3 muppets to communicate was a bit of challenge. It’s been well over a year since I did a proof-of-concept for getting airflow up and running in docker. I knew nothing back then about docker and airflow but eventually it all ran well. At the time, I leveraged off an existing airflow Dockerfile, which I came across in github.

I didn’t get a chance at the time to get the environment working with the Teradata JDBC driver but some things are better late than never. This post describes my recent experiment to get it (mostly) working.

Back to the Famous Airflow Git Repo

Airflow can be run in a couple of different modes with some allowing for architecture/system scaling. My focus for now is to get up and running so an airflow instance using LocalExecutor mode will suffice.

First step is to clone the git repo (latest version as of now is: https://github.com/puckel/docker-airflow/releases/tag/1.10.6-1) using:

PS C:\> git clone https://github.com/puckel/docker-airflow.gitCloning into 'docker-airflow'...
remote: Enumerating objects: 785, done.
Receiving objects: 92% (723/785), 10 (delta 0), pack-reused 785 Receiving objects: 91% (715/785), 196.01 KiB | 330.00 KiB/s
Receiving objects: 100% (785/785), 209.77 KiB | 355.00 KiB/s, done.
Resolving deltas: 100% (420/420), done.

Before we can start the build of the docker image with support for Teradata/JBDC connectivity, we’ll need to make some changes to the existingDockerfile and download the Teradata JDBC driver.

Download Teradata JDBC Driver

Download the zip file and extract it to the root location of your cloned repo. In my case, that location is C:\docker-airflow.

Updating the Dockerfile

airflow[crypto,celery,postgres,hive,jdbc,.....

To include the OpenJDK & Teradata driver into the final docker image, we need to add some steps into the Dockerfile.

Open the Dockerfile and include the following just before the line:
COPY script/entrypoint.sh /entrypoint.sh

C:\docker-airflow\Dockerfile

#Install JAVA & Teradata JDBC DriverRUN mkdir -p /usr/share/man/man1
RUN mkdir -p /etc/ssl/certs
RUN mkdir -p /usr/local/airflow/jars
COPY terajdbc4.jar /usr/local/airflow/jars/terajdbc4.jar
RUN apt-get update -y && \
apt-get install -y openjdk-8-jdk && \
apt-get autoclean && \
apt autoremove
RUN apt-get update -y && \
apt-get install -y ca-certificates-java && \
update-ca-certificates -f && \
apt-get -y autoclean && \
apt autoremove
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/
ENV CLASSPATH /usr/local/airflow/jars/terajdbc4.jar

Save changes once done.

Building the Airflow Docker Image

PS C:\docker-airflow> docker build --rm -t puckel/docker-airflow:latest .

Once the build complete, you should see the following message:

Successfully tagged puckel/docker-airflow:latest

Edit Docker-Compose .yml

PS C:\docker-airflow> dir docker-compose-LocalExecutor.ymlDirectory: C:\docker-airflow-a — — 28/11/2019 3:53 PM 839 docker-compose-LocalExecutor.yml

Edit the file by replacing the following :

image: puckel/docker-airflow:1.10.6

with:

image: puckel/docker-airflow:latest

Run Docker-Compose YML

Run the following to bring up containers:

PS C:\docker-airflow> docker-compose -f docker-compose-LocalExecutor.yml up -d

The docker containers should start and you should get output similar to the following:

Creating network “docker-airflow_default” with the default driver
Creating docker-airflow_postgres_1 … done
Creating docker-airflow_webserver_1 … done

To verify that the containers are up and running, logon to the airflow GUI at

http://localhost:8080

Test Teradata JDBC Connectivity

Start by finding out the container id that’s hosting airflow by running:

PS C:\docker-airflow> docker psCONTAINER ID IMAGE 
b556797630c5 puckel/docker-airflow:latest

Now logon to the container with an interactive bash shell and run the code as shown. Be sure to replace the respective values <teradata_server>, <user>, <pwd> to reflect your environment:

PS C:\docker-airflow> docker exec -ti 0b385c800504 bash airflow@b556797630c5:~$ python
Python 3.7.5 (default, Nov 23 2019, 06:31:35)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
import jaydebeapi#teradata server,
var_tdpid = '<teradata_server>'
#Teradata Credentials
var_userid = '<user>'
var_password = '<pwd>'
#Class and driver location
td_jars = ['/usr/local/airflow/jars/terajdbc4.jar']
tdat_class='com.teradata.jdbc.TeraDriver'
#Jdbc Connection string
url='jdbc:teradata://'+var_tdpid+'/'
driver_args=[var_userid,var_password]con = jaydebeapi.connect(jclassname=tdat_class \
,url=url, driver_args=driver_args, \
jars=td_jars)
cur = con.cursor ()
cur.execute ("SELECT date")
rows = cur.fetchall()
print(rows)
cur.close()
con.close()

If we run the above code, we get an error trace that includes the following output:

/usr/local/lib/python3.7/site-packages/jpype/_core.py:210: UserWarning: 

Deprecated: convertStrings was not specified when starting the JVM. The default
behavior in JPype will be False starting in JPype 0.8. The recommended setting
for new code is convertStrings=False. The legacy value of True was assumed for
this session. If you are a user of an application that reported this warning,
please file a ticket with the developer.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — “””)
Traceback (most recent call last):
File “<stdin>”, line 3, in <module>
File “/usr/local/lib/python3.7/site-packages/jaydebeapi/__init__.py”, line 381, in connect
jconn = _jdbc_connect(jclassname, url, driver_args, jars, libs)
File “/usr/local/lib/python3.7/site-packages/jaydebeapi/__init__.py”, line 183, in _jdbc_connect_jpype
types_map[i.getName()] = i.getStaticAttribute()
AttributeError: ‘_jpype.PyJPField’ object has no attribute ‘getStaticAttribute’
>>> cur = con.cursor ()

Let’s check which versions of the python packages (jaydebeapi & JPype) have been installed in the docker image. JPype1 is the package that’s throwing the error.

airflow@b556797630c5:~$ pip show JPype1
Name: JPype1
Version: 0.7.0
Summary: A Python to Java bridge.
Home-page: https://github.com/jpype-project/jpype
Author: Steve Menard
Author-email: devilwolf@users.sourceforge.net
License: License :: OSI Approved :: Apache Software License
Location: /usr/local/lib/python3.7/site-packages
Requires:
Required-by: JayDeBeApi

airflow@b556797630c5:~$ pip show jaydebeapi
Name: JayDeBeApi
Version: 1.1.1
Summary: Use JDBC database drivers from Python 2/3 or Jython with a DB-API.
Home-page: https://github.com/baztian/jaydebeapi
Author: Bastian Bowe
Author-email: bastian.dev@gmail.com
License: GNU LGPL
Location: /usr/local/lib/python3.7/site-packages
Requires: JPype1
Required-by:

So, we have:
JPype1 version 0.7.0
JayDeBeApiversion 1.1.1

Let’s take a quick look at the official package details for JPype1 at the python package index:

The “Release History” shows that the latest version (0.7.0) was released on June 25th 2019, which is the same we have installed in our docker image (0.7.0). The version released prior to this was (0.6.3) back in Apr 2018, i.e. over 1 year prior to the latest update we’re running:

If we do the same investigation for jaydebeapi, we see that there hasn’t been any recent updates (nothing since Mar 22, 2017) and that the most current version remains at 1.1.1.

So, I’m going to take a wild turn for this experiment and downgrade JPype1 from 0.7.0 to 0.6.3 . I won’t mess with rebuilding the docker image until the downgrade proves to resolve the issue.

To perform the downgrade, I’ll use an interactive shell, but will logon to the container as root this time.

Downgrade by logging on as root and using pip from within the container:

S C:\docker-airflow> docker exec -ti --user root b556797630c5 bashroot@b556797630c5:/usr/local/airflow#
pip install --upgrade JPype1==0.6.3
..............
Installing collected packages: JPype1
Found existing installation: JPype1 0.7.0
Uninstalling JPype1-0.7.0:
Successfully uninstalled JPype1-0.7.0
Successfully installed JPype1-0.6.3

Now let’s execute the same python code that failed earlier on.

root@b556797630c5:/usr/local/airflow# python
Python 3.7.5 (default, Nov 23 2019, 06:31:35)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import jaydebeapi
>>>
>>> #teradata server,
... var_tdpid = '<teradata_server>'
>>>
>>> #Teradata Credentials
... var_userid = '<user>'
>>> var_password = '<pwd>'
>>>
>>> #Class and driver location
... td_jars = ['/usr/local/airflow/jars/terajdbc4.jar']
>>> tdat_class='com.teradata.jdbc.TeraDriver'
>>>
>>> #Jdbc Connection string
... url='jdbc:teradata://'+var_tdpid+'/'
>>>
>>> driver_args=[var_userid,var_password]
>>>
>>>
>>> con = jaydebeapi.connect(jclassname=tdat_class \
... ,url=url, driver_args=driver_args, \
... jars=td_jars)
>>> cur = con.cursor ()
>>> cur.execute ("SELECT date")
>>> rows = cur.fetchall()
>>> print(rows)
[('2019-11-28',)]
>>> cur.close()
>>> con.close()

This time we get back a response without the error trace, which is a good sign that it all works.

Teradata JDBC Connection via Airflow Web GUI

Substitute the above to match the environment your working with.

Now select “Data Profiling -> Ad Hoc Query

Select the connection that we just created.

Enter a simple query as shown below: select * from dbc.dbcinfo and hit Run

We get an error an error returned:

Execution failed on sql ‘ SELECT * FROM ( select from dbc.dbcinfo ) qry LIMIT 100000 ‘: java.sql.SQLException: [Teradata Database] [TeraJDBC 16.20.00.13] [Error 3706] [SQLState 42000] Syntax error: expected something between the ‘select’ keyword and the ‘from’ keyword.

This error seems to be caused by either a config issue or a bug, but after tweaking the sql, the workaround to get it working was to enclose the query in brackets, () as follows:

(select * from dbc.dbcinfo)

Let’s try again:

This time, we’re successful.

What’s Next?

Another good test to carry out would be to create a DAG that makes use of the JDBC connection to ensure it all works fine.

Learner. Interests include Cloud and Devops technologies.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store