Introduction to Querying S3 Data with Teradata READ_NOS

Teradata Setup Activities

All code samples presented throughout the article have been tested using a local instance of Teradata Vantage Express 17.10 for Virtual Box, which can be downloaded after signing up for a free account at https://downloads.teradata.com. Due to variations in operator syntax between Teradata versions, be sure to check the official documentation if running a version other than 17.10.

NOS User

Create a Teradata username/password NOS_DEMO_USER/NOS_DEMO_USER, to be used when running code samples.

CREATE USER NOS_DEMO_USER FROM DBC AS PASSWORD=NOS_DEMO_USER, PERM=50e8;

Grant Access

Grant the new user permissions to execute READ_NOS operator.

GRANT EXECUTE FUNCTION on TD_SYSFNLIB.READ_NOS to NOS_DEMO_USER;

Teradata-supplied Sample Store

External stores can be classified as either public or private. Authentication is not required to access a public store, however, private stores are secured.

Store Contents

Below is a visual representation of the store folder hierarchy and file naming convention:

----------------------------------------------
* Teradata public store
* Contains river flow data sourced
from USGS Surface-Water Data Sets
(https://waterdata.usgs.gov/nwis/sw).
----------------------------------------------

s3://td-usgs-public
└───
├── /<data-file-format>
│ ├── /<river-site-id>
│ │ └── /<yyyy>
│ │ ├── /<mm>
│ │ │ ├── /<dd>.<ext>
....
....

Authentication

Public stores can be accessed anonymously. From a Teradata to S3 authentication context, this translates to supplying empty values for AccessKeyId and SecretAccessKey:

AccessKeyId: ''
SecretAccessKey: ''

Inputs to Teradata READ_NOS when Working with Teradata Store

Summing up, the following store details will be required as input to READ_NOS.

Bucketname: td-usgs-public
S3 Endpoint: s3.amazonaws.com
AccessKeyId: ''
SecretAccessKey: ''

Creating an S3 Store

The steps below outline the general procedure for setting up an S3 private store.

Step 1 — Create Store

Below is an example for creating a store named s3://my-private-surface-water-data-store in AWS region us-west-1 using the AWS CLI:

$ aws \
--profile aws_acct_admin \
--region us-west-1 \
s3 mb s3://my-private-surface-water-data-store

Step 2 — Create IAM User

In order to create credentials (IAM Access key) for accessing the store, an IAM user will be required. Below is an example for creating user with friendly name my-private-surface-water-data-user

$ aws \
--profile aws_acct_admin \
iam create-user --user-name my-private-surface-water-data-user
{
"User": {
"Path": "/",
"UserName": "my-private-surface-water-data-user",
"UserId": "uuuuuuuuuuuuuuu",
"Arn": "arn:aws:iam::xxxxxxxxxxxx:user/my-private-surface-water-data-user",
....
}
}

Step 3 — Credentials for IAM User

Generate an IAM Access key (AccessKeyId/ SecretAccessKey) for the new user :

$ aws \
--profile aws_acct_admin \
iam create-access-key --user-name my-private-surface-water-data-user
{
"AccessKey": {
...
"AccessKeyId": "AKIATZSC8AVA4PNC6VLC",
...
"SecretAccessKey": "qpGYpDMFiKgPczxEJgKgxFO9cUhiiuqJFESl61Gw",

...
}
}

Step 4 — Grant IAM User Access to S3 Store

To execute read operations using Teradata operator READ_NOS, the new user would require the following access to the store:

  • S3:GetObject
  • S3:ListBucket
  • S3:GetBucketLocation
$ aws \
--profile aws_acct_admin \
iam put-user-policy \
--user-name my-private-surface-water-data-user \
--policy-name my-private-surface-water-data-policy \
--policy-document \
'{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1",
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::my-private-surface-water-data-store"
},
{
"Sid": "Stmt2",
"Action": [
"s3:GetObject"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::my-private-surface-water-data-store/*"
}
]
}'

Step5 — Upload Sample CSV Data

Sample data from the Teradata public store is low volume (~12MB) and serves as a good starting point for NOS discovery.

$ aws \
--profile aws_acct_admin \
s3 sync s3://td-usgs-public \
s3://my-private-surface-water-data-store

Step 6 — Note Down Private Store Details

The final details for the new store are summarised below. These will required as inputs to Teradata READ_NOS operator.

Bucketname: my-private-surface-water-data-store
S3 Endpoint: s3.us-west-1.amazonaws.com
AccessKeyId: AKIATZSC8AVA4PNC6VLC
SecretAccessKey: qpGYpDMFiKgPczxEJgKgxFO9cUhiiuqJFESl61Gw

Teradata READ_NOS Operator

A brief introduction to READ_NOS basic syntax elements is discussed next before moving onto practical usage examples.

READ_NOS Syntax

Basic syntax elements followed by descriptions are shown below. These should serve as a good starting point for exploring an S3 store.

SELECT *
FROM READ_NOS (
USING
LOCATION ('/connector/bucket.endpoint/[key_prefix]')

AUTHORIZATION (
'{
"Access_ID":"<ACCESS-KEY>"
,"Access_Key":"<SECRET-ACCESS-KEY>"
}'
)

RETURNTYPE ( { 'NOSREAD_RECORD' | 'NOSREAD_KEYS' } )
) AS D
;
  • IAM key details required to access the S3 bucket
  • NOTE: for PUBLIC buckets <ACCESS-KEY> & <SECRET-ACCESS-KEY>, would be passed as empty strings

READ_NOS Examples

A few examples of querying the private store created in section "Creating an S3 Store" will be presented in this section.

bucketname: my-private-surface-water-data-store
S3 service Endpoint: s3.us-west-1.amazonaws.com
AccessKeyId: AKIATZSC8AVA4PNC6VLC
SecretAccessKey: qpGYpDMFiKgPczxEJgKgxFO9cUhiiuqJFESl61Gw

Example 1: Listing Store Objects

Below shows RETURNTYPE ('NOSREAD_KEYS') being used to return the TOP 5 objects/keys.

SELECT TOP 5 location
FROM
READ_NOS (
USING
LOCATION ('/s3/my-private-surface-water-data-store.s3.us-west-1.amazonaws.com/')
AUTHORIZATION ('{"Access_ID":"AKIATZSC8AVA4PNC6VLC" ,"Access_Key":"qpGYpDMFiKgPczxEJgKgxFO9cUhiiuqJFESl61Gw"}')
RETURNTYPE ('NOSREAD_KEYS')
) AS D
ORDER BY 1;
Example 1, query output
s3://my-private-surface-water-data-store
└───
├── /CSVDATA
│ ├── /09380000
│ │ └── /2018
│ │ ├── /06
│ │ │ ├── /27.csv
│ │ │ ├── /28.csv
│ │ │ ├── /29.csv
│ │ │ ├── /30.csv
│ │ ├── /07
│ │ │ ├── /01.csv
...
...
Example 1, breakdown of Location

Example 2: Get the Total Number of Store Objects

Count of objects/files can be retrieved using:

SELECT 
COUNT(*) AS FILE_COUNT
FROM
(
SELECT *
FROM READ_NOS (
USING
LOCATION ( '/s3/my-private-surface-water-data-store.s3.us-west-1.amazonaws.com/' )
AUTHORIZATION ('{"Access_ID":"AKIATZSC8AVA4PNC6VLC"
,"Access_Key":"qpGYpDMFiKgPczxEJgKgxFO9cUhiiuqJFESl61Gw"}')
RETURNTYPE ('NOSREAD_KEYS') ) AS D
) T
Example 2, Query Output

Example 3: View Sample Records in a File

Recall the first Location observed in the result set from Example 1 which represents a file/object:

First row returned in resultset from Example 1
SELECT TOP 5 *
FROM
READ_NOS (
USING
LOCATION ('/s3/my-private-surface-water-data-store.s3.us-west-1.amazonaws.com/CSVDATA/09380000/2018/06/27.csv')
AUTHORIZATION ('{"Access_ID":"AKIATZSC8AVA4PNC6VLC"
,"Access_Key":"qpGYpDMFiKgPczxEJgKgxFO9cUhiiuqJFESl61Gw"}')
RETURNTYPE ('NOSREAD_RECORD')
) AS D
Example 3 output
  • READ_NOS returned the result set including column names
  • this is a direct result of the implicit value for READ_NOS parameter HEADER being set to TRUE and presence of a header record within the file

Verifying READ_NOS results

Use the AWS CLI, output from Example 3 can be verified by piping first 5 records of the file tostdout.

$  aws s3 \
--profile aws_acct_admin \
cp s3://my-private-surface-water-data-store/CSVDATA/09380000/2018/06/27.csv - \
| head -n5
Verify Example 3 output using AWS CLI

Conclusion

In summary:

  • NOS is a Teradata capability which allows end users to interact with an external data store, such as an S3 bucket
  • stores can be setup as publicly accessible or secured
  • READ_NOS is a Teradata operator which allows a user to run select statements against store contents
  • when authenticating to an S3 public using this operator, credentials are supplied as empty strings, i.e.
Access_ID = ''
Access_Key = ''
  • an example of a public S3 store supplied by Teradata can be found at s3://td-usgs-public
  • if attempting to execute READ_NOS examples in this article on a database version other than 17.10, be sure to check version specific documentation

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Tony Tannous

Tony Tannous

62 Followers

Learner. Interests include Cloud and Devops technologies.