Teradata RDBMS: Generating HMAC SHA256, MD5 and SHA1 Hashes

I’ve come across Teradata User-Defined-Functions (UDF) for generating non key-based hashes (such as MD5 & SHA256) but have yet to come across Teradata HMAC equivalents of these (yet).

That was a good enough reason to work on developing a C/C++ UDF to cover HMAC MD5/SHA256/SHA1 hashing.

Why the Need for Hashing?

Data Science/Analytics on Personally Identifiable Information (PII)

➤ Presenting data in its hashed format via Views within the RDBMS allows sensitive attributes to be exposed in their obfuscated/masked format

➤ Exposing hashed data to a wider audience for analytical purposes mitigates the risks of potential PII “data leaks”. Hashing may not be an option for all situations, but should be the first option considered before being ruled out

Of the three algorithms covered in this article, HMAC-SHA256 provides a cryptographically stronger hash, which is less susceptible to hash collisions and brute force attacks

Transmission of Data

➤ Hashing can also add an extra layer of security in cases where transmitting data extracts outside an organization to external partners is required

➤ Hashed data can be exported directly from Teradata into file extracts, which are encrypted (e.g. using PGP) before transmission

➤ Breaking the “outer layer” encryption (e.g the PGP’d file), ensures the “inner layer” (i.e hashed content) remains secure

Guidelines for HMAC

RFC2104 Spec

The RFC2104¹ spec for HMAC states the following:

2. Definition of HMAC

The definition of HMAC requires a cryptographic hash function, which
we denote by H, and a secret key K. We assume H to be a cryptographic
hash function where data is hashed by iterating a basic compression
function on blocks of data. We denote by B the byte-length of such
blocks…and by L the byte-length of hash outputs…
The authentication key K can be of any length up to B, the
block length of the hash function. Applications that use keys longer
than B bytes will first hash the key using H and then use the
resultant L byte string as the actual key to HMAC. In any case the
minimal recommended length for K is L bytes (as the hash output
length).

3. Keys

The key for HMAC can be of any length (keys longer than B bytes are
first hashed using H). However, less than L bytes is strongly
discouraged as it would decrease the security strength of the
function. Keys longer than L bytes are acceptable but the extra
length would not significantly increase the function strength. (A
longer key may be advisable if the randomness of the key is
considered weak.)

Keys need to be chosen at random (or using a cryptographically strong
pseudo-random generator seeded with a random seed), and periodically
refreshed. (Current attacks do not indicate a specific recommended
frequency for key changes as these attacks are practically
infeasible. However, periodic key refreshment is a fundamental
security practice that helps against potential weaknesses of the
function and keys, and limits the damage of an exposed key.)

Let’s break down the excerpts above and put them into context for this article.

Key Length

H The “H” refers to the Hash algorithms, so for our case that equates to MD5, SHA256 & SHA1

B The “B” refers to the respective algorithm’s Blocksize

L The “L” refers to the Length of the hash output, in bytes

K The “K” refers to the secret Key

By looking at the spec² details for each algorithm, we get the following

Specifications for Algorithm Key/Blocksizes

➤ What length should we choose for our secret Key (K)?

“The key for HMAC can be of any length (keys longer than B bytes are
first hashed using H). However, less than L bytes is strongly
discouraged
as it would decrease the security strength of the
function. Keys longer than L bytes are acceptable but the extra
length would not significantly increase the function strength”

➤ Reading between the lines, the Key size (in bytes), should be equal to L (the respective algorithm’s Output Size).

➤ On that basis, these are the recommended key lengths for each algorithm we’re covering:

MD5    : K = 128/16 (bits/bytes)
SHA1 : K = 160/20 (bits/bytes)
SHA256 : K = 256/32 (bits/bytes)

Generating a Secret Key (K)

The RFC2104 excerpts from above, also tell us that secret keys should be generated using a “Cryptographically Strong Pseudo-Random (Number) Generator” (CSPRNG).

The literature on generating secure random numbers is vast and deserves a separate topic for discussion. It’s also a topic where I’m still working my way through. A good starting point into gaining an understanding can be found at the following links.

For now, we’ll assume that we’re using a Linux OS that generates random, secure bytes at /dev/urandom, which we’ll use to source our secret key/s.

Using hexdump, we can source the number of bytes we need for each algorithm as follows (bearing in mind what has already been discussed regarding the recommended key length).

➤ For MD5:

echo `hexdump -n 16 /dev/urandom -v -e '"" 1/1 "%02X" ""'`ca5af32a784e22de51df623bf13f51b1

➤ For SHA1:

echo `hexdump -n 20/dev/urandom -v -e '"" 1/1 "%02X" ""'`0f04ae0aa54db11c0c53dc36fbefbac27e099dbf

➤ For SHA256:

echo `hexdump -n 32/dev/urandom -v -e '"" 1/1 "%02X" ""'`6e670ed3e0ebf1daf7570f5f83b45ed48d6df3848c09f38c319c8df3fa5dfc41

➤ It’s imperative that thought is given into the security measures for preventing exposure of secret key/s outside of a controlled environment.

How/Where Should the Keys be Stored?

Again, this also warrants a separate topic for discussion. The UDF accepts the key as a parameter.

So, how do we securely pass the key value to the UDF?

➤ Storing the key in a Teradata database table, in its raw format is an option that would likely cause security experts to frown

➤ Using this approach, however, can serve as an interim solution for purposes of testing scenarios, or proof-of-concept where the key/s are “throw-away” keys used solely for hashing fake/test data

If you decide that the risk of storing your key in a table is acceptable in a Production environment, then a possible approach would be:

➤ Use a correlated sub-query to retrieve the key value as input to the UDF

➤ The correlated sub-query should return a single row/column which contains the key value

➤ This ensures the key remains protected in a secure RDBMS table, governed by appropriate security/access control measures

Here are some points to note if you do decide to go down this path:

  • The key/s should not be loaded using Client utilities such as SQL Assistant, BTEQ. For example, the following statement:
INSERT INTO myproj.k_store (KEY_NAME, KEY_VALUE) VALUES (
‘hmac_sha256_key’
,‘XXXXXXXXXXXXXXXX
)

would expose the key value (XXXXXXXXXXXXXXXX) in Teradata’s Query (dbc.dbql*) and Access log (dbc.accesslog) tables/views

  • Instead, look into loading the key/s from a file using Teradata TPT Operators, for which parameter values are not resolved in the Query and Access log tables/views
  • Once the key has been loaded, its value should never be hard coded into SQL queries/qualifiers (unless it’s a throw away key used solely for “sandpit” testing). As an example, the following query would expose the key’s value in the RDBMS log tables:
SELECT * FROM myproj.k_store WHERE KEY_VALUE = ‘XXXXXXXXXXXXXXXX
  • As mentioned earlier, a correlated sub-query can be used to retrieve the key’s value as input to the UDF. Examples of this will be demonstrated later in this article
  • Ensure the key is generated on a machine that conforms to security standards (i.e, encrypted storage, attached to secure network, etc.)

UDF Dependencies

Below are the pre-requisite libraries/headers required for compiling the UDF.

Hash Libraries

The UDF utilises headers and functions from this library³. The relevant dependencies will be included “as-is” in the git repo for the UDF code.

Header File for Teradata Equivalent C Data Types

The sqltypes_td.h header file contains Teradata data types and their equivalent “C” types. It is also included in the UDF repo. It was downloaded “as-is” from a 16.20 Teradata Express VM (directory /usr/tdbms/etc).

UDF Installation

➤ Clone the UDF git repo

git clone https://github.com/tonys-code-base/teradata-hmac-hash-udf.git

or download as a.zipand extract locally

https://github.com/tonys-code-base/teradata-hmac-hash-udf/archive/master.zip

➤ Ensure you have sufficient privileges to install the UDF on the target Teradata database

➤ Modify the UDF CREATE FUNCTION code (create_hmac_hash_function.sql) so that it reflects the correct target database for your installation

➤ Open a shell/command prompt and change directory into the location where the repo was cloned/unzipped to

➤ Run BTEQ, and execute the modified version of create_hmac_hash_function.sql to install the UDF

➤ Once complete, the UDF appears as:

<database_name>.hmac_hash

Loading Keys into a Teradata Table

⚠️ Please read details under sub-heading “How/Where Should the Keys be Stored?” before continuing

Table Format

Keys will be loaded into a secure Teradata table with the following structure

CREATE TABLE myproj.k_store 
(
key_name VARCHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,
key_value VARCHAR(512) CHARACTER SET LATIN CASESPECIFIC
)
PRIMARY INDEX ( key_name )
;

where:

key_name: represents the hmac key identifier
key_value: contains the respective secret key value

Save Keys to File on Encrypted Drive

Each of the keys that were generated previously under sub-heading “Generating a Secret Key (K)” can now be saved into a pipe (|) delimited text file ready for loading into the target table.

➤ Create the file as shown below, saving the keys, as previously generated

hmac_sample_keys.txt

Load Keys using Teradata Parallel Transporter (TPT)

The following TPT script (available in the repo) can be used to load the hmac_sample_keys.txt into the target table.

➤ Modify the above script and replace the following to reflect your setup:

<Teradata_Host/IP>
<Teradata_Username>
<Teradata_Password>

➤ You will also need to change the database name from myprojto reflect the location of table k_store for your setup.

➤ Execute the script using:

tbuild -f tpt_sample_load_k_store.script -j tpt_sample_load_k_store -s 1

UDF Parameters

The UDF takes in 3 parameters:

hmac_hash(msg, secret_key, hmac_hash_type)

msg

The “raw” text to be hashed
Data Type: VARCHAR(2500) CHARACTER SET LATIN

secret_key

The “secret_key” must be the character representation of valid HEX bytes, used as input into the hmac hash algorithm
Data Type: VARCHAR(512) CHARACTER SET LATIN

Examples of acceptable format values are as shown below. Note, these are hard-coded here only to illustrate format, please refer to sub-heading “How/Where Should the Keys be Stored?” :

hmac_hash(msg, '6173646667686a6b6c666531', hmac_hash_type)
hmac_hash(msg, '6f61696679726568666b6a32', hmac_hash_type)
hmac_hash(msg, '696f6a666566656665667733', hmac_hash_type)

hmac_hash_type

This parameter defines the output hmac hash type that is required.
Data Type: VARCHAR(30)

If this parameter value is:

hmac_sha256: Then a HMAC SHA256 output hash is generated
hmac_md5: Then a HMAC MD5 output hash is generated
hmac_sha1: Then a HMAC SHA1 output hash is generated

“Fetching” the SECRET_KEY as Input to UDF

Using a correlated sub-query, we can obtain the key without exposing it in Teradata’s log tables by substituting the following query as input to the SECRET_KEY UDF Parameter:

SELECT key_value FROM myproj.k_store WHERE key_name = '<key_name>';

UDF Usage Examples

The examples of UDF invocations that follow assume the following:

➤ The test Secret Keys used are sourced from table myproj.k_store

➤ The UDF has been installed to target database MYPROJ

➤ The message (MSG) being hashed is
The quick brown fox jumps over the lazy dog.

HMAC SHA256 hash Example

SELECT MYPROJ.hmac_hash(
'The quick brown fox jumps over the lazy dog.'
,(SELECT key_value
FROM myproj.k_store WHERE key_name= 'h_sha256_key')
,'HMAC_SHA256'
)

Output:

bb60d08740527e04c95f8d7c615a9e5c425951192913976c04c3f3419fa61004

HMAC MD5 hash Example

SELECT MYPROJ.hmac_hash(
'The quick brown fox jumps over the lazy dog.'
,(SELECT key_value
FROM myproj.k_store WHERE key_name = 'h_md5_key')
,'HMAC_MD5'
)

Output:

d74a6e93a5bc07422e3e18db7de13bc4

HMAC SHA1 hash Example

SELECT MYPROJ.hmac_hash(
'The quick brown fox jumps over the lazy dog.'
,(SELECT key_value
FROM myproj.k_store WHERE key_name = 'h_sha1_key')
,'HMAC_SHA1'
)

Output:

9aa16612fc822e77a149896701f65c4aa64d7614

Closing Comments

➤ Once the UDF has been thoroughly tested, its execution mode can be changed to NON PROTECTED

➤ The secret_key bytes, as read from /dev/urandom are treated as single 8 bit bytes (base16) loaded into a Teradata column defined as LATIN (server character set). They will not all decode to printable characters

➤ If you’re curious to see the decoded hex key, you can look at using the udf_LatinHex2Char UDF which comes as part of the Teradata Unicode ToolKit , or try running

select ‘XXXXXXXXXXXX’xc 

where XXXXXXXXXXXX is your secret key in base16 hex. The xc tells Teradata that input is a string of character bytes, represented in hex format.

➤ You may wish to add a SALT to the input string/MSG for additional security. This should also ideally be generated in the same way as the secret_key

References

[1]: RFC 2104 — HMAC: Keyed-Hashing for Message Authentication.
(Feb 1997). Definition of HMAC. Keys.

[2]: en.wikipedia.org.
(Updated Jan 2020). Secure Hash Algorithms

[3]: Portable C++ Hashing Library.
(Jun 2014) create.stephan-brumme.com

Learner. Interests include Cloud and Devops technologies.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store