Introduction to Querying S3 Data with Teradata READ_NOS

Native Object Store (NOS) is a Teradata feature which allows for data-in-place operations against a variety of external store types, including S3. By using inbuilt SQL functions and operators, an end user is able to perform read/write operations. For example, table operator READ_NOS can be used to read/search the contents of a store.

This article aims to:

• cover the Teradata activities required for accessing READ_NOS table operator
• provide an overview of a publicly accessible Teradata-supplied S3 store, which could be used to run code samples (when access to create an S3 store is not possible)
• describe the procedure for creating an AWS S3 data store
• present some examples of querying the store using READ_NOS

Teradata Setup Activities

All code samples presented throughout the article have been tested using a local instance of Teradata Vantage Express 17.10 for Virtual Box, which can be downloaded after signing up for a free account at https://downloads.teradata.com. Due to variations in operator syntax between Teradata versions, be sure to check the official documentation if running a version other than 17.10.

Once the local instance running, an NOS user with appropriate access rights should be created by logging onto the RDBMS with username/password of dbc/dbc and executing the statements outlined below.

NOS User

Create a Teradata username/password NOS_DEMO_USER/NOS_DEMO_USER, to be used when running code samples.

CREATE USER NOS_DEMO_USER FROM DBC AS PASSWORD=NOS_DEMO_USER, PERM=50e8;

Grant Access

Grant the new user permissions to execute READ_NOS operator.

GRANT EXECUTE FUNCTION on TD_SYSFNLIB.READ_NOS to NOS_DEMO_USER;

Teradata-supplied Sample Store

External stores can be classified as either public or private. Authentication is not required to access a public store, however, private stores are secured.

The S3 store supplied by Teradata is a public store containing sample river flow data sourced from USGS Surface-Water Data Sets.

Official documentation references the store as
td-usgs-public.s3.amazonaws.com. This S3 virtual style address takes the form: bucket-name.s3.Region.amazonaws.com/key-name.

The omission of the Region in the store name is related to the interpretation of the S3 Service endpoint when Region has a value of us-east-1, i.e. s3.amazonaws.com.

Understanding the store addressing style is important when working with Teradata NOS operators. This will become apparent when working through the examples.

Store Contents

Below is a visual representation of the store folder hierarchy and file naming convention:

----------------------------------------------
* Teradata public store
* Contains river flow data sourced
from USGS Surface-Water Data Sets
(https://waterdata.usgs.gov/nwis/sw).
----------------------------------------------

s3://td-usgs-public
└───
├── /<data-file-format>
│ ├── /<river-site-id>
│ │ └── /<yyyy>
│ │ ├── /<mm>
│ │ │ ├── /<dd>.<ext>
....
....

Files/folders listed above are named according to the descriptions below.

Top level folder :<data-file-format>
Common names are:
• CSVDATA: CSV files
• PARQUETDATA: Parquet files
• JSON: Json files

Second level folder : <river-site-id>
• folder name is the 8 digit site identifier where river flow data was   collected
• for example value of 09394500 represents LITTLE COLORADO RIVER AT  WOODRUFF, AZ
• the following link to USGS website contains an interactive map with  river site details

Third level folder : <yyyy>
• folder name is a 4 digit data collection year

Fourth level folder : <mm>
• folder name is the 2 digit month of year when data was collected

Filename : <dd>.<ext>
• <dd> the 2 digit day of month when data was collected
• <ext> file extension, which describes content format
 possible values are: csv parquet json

Authentication

Public stores can be accessed anonymously. From a Teradata to S3 authentication context, this translates to supplying empty values for AccessKeyId and SecretAccessKey:

AccessKeyId: ''
SecretAccessKey: ''

Inputs to Teradata READ_NOS when Working with Teradata Store

Summing up, the following store details will be required as input to READ_NOS.

Bucketname: td-usgs-public
S3 Endpoint: s3.amazonaws.com
AccessKeyId: ''
SecretAccessKey: ''

Creating an S3 Store

The steps below outline the general procedure for setting up an S3 private store.

Step 1 — Create Store

Below is an example for creating a store named s3://my-private-surface-water-data-store in AWS region us-west-1 using the AWS CLI:

$ aws \
--profile aws_acct_admin \
--region us-west-1 \
s3 mb s3://my-private-surface-water-data-store

The region specified is us-west-1 and has corresponding S3 Service endpoint of s3.us-west-1.amazonaws.com. Significance of the S3 Service endpoint will become apparent when running the NOS examples presented later on.

Step 2 — Create IAM User

In order to create credentials (IAM Access key) for accessing the store, an IAM user will be required. Below is an example for creating user with friendly name my-private-surface-water-data-user

$ aws \
--profile aws_acct_admin \
iam create-user --user-name my-private-surface-water-data-user

Output:

{
"User": {
"Path": "/",
"UserName": "my-private-surface-water-data-user",
"UserId": "uuuuuuuuuuuuuuu",
"Arn": "arn:aws:iam::xxxxxxxxxxxx:user/my-private-surface-water-data-user",
....
}
}

Step 3 — Credentials for IAM User

Generate an IAM Access key (AccessKeyId/ SecretAccessKey) for the new user :

$ aws \
--profile aws_acct_admin \
iam create-access-key --user-name my-private-surface-water-data-user

Output should contain the AccessKey details:

{
"AccessKey": {
...
"AccessKeyId": "AKIATZSC8AVA4PNC6VLC",
...
"SecretAccessKey": "qpGYpDMFiKgPczxEJgKgxFO9cUhiiuqJFESl61Gw",

...
}
}

Step 4 — Grant IAM User Access to S3 Store

To execute read operations using Teradata operator READ_NOS, the new user would require the following access to the store:

  • S3:GetObject
  • S3:ListBucket
  • S3:GetBucketLocation

Attach an inline policy named my-private-surface-water-data-policy to the user. The policy includes the above access:

$ aws \
--profile aws_acct_admin \
iam put-user-policy \
--user-name my-private-surface-water-data-user \
--policy-name my-private-surface-water-data-policy \
--policy-document \
'{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1",
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::my-private-surface-water-data-store"
},
{
"Sid": "Stmt2",
"Action": [
"s3:GetObject"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::my-private-surface-water-data-store/*"
}
]
}'

Step5 — Upload Sample CSV Data

Sample data from the Teradata public store is low volume (~12MB) and serves as a good starting point for NOS discovery.

Copy the contents over to the new bucket using the AWS CLI.

$ aws \
--profile aws_acct_admin \
s3 sync s3://td-usgs-public \
s3://my-private-surface-water-data-store

Step 6 — Note Down Private Store Details

The final details for the new store are summarised below. These will required as inputs to Teradata READ_NOS operator.

Bucketname: my-private-surface-water-data-store
S3 Endpoint: s3.us-west-1.amazonaws.com
AccessKeyId: AKIATZSC8AVA4PNC6VLC
SecretAccessKey: qpGYpDMFiKgPczxEJgKgxFO9cUhiiuqJFESl61Gw

Teradata READ_NOS Operator

A brief introduction to READ_NOS basic syntax elements is discussed next before moving onto practical usage examples.

READ_NOS Syntax

Basic syntax elements followed by descriptions are shown below. These should serve as a good starting point for exploring an S3 store.

Complete list of elements is included in the official SQL Operators and User Defined Functions — READ_NOS user guide.

SELECT *
FROM READ_NOS (
USING
LOCATION ('/connector/bucket.endpoint/[key_prefix]')

AUTHORIZATION (
'{
"Access_ID":"<ACCESS-KEY>"
,"Access_Key":"<SECRET-ACCESS-KEY>"
}'
)

RETURNTYPE ( { 'NOSREAD_RECORD' | 'NOSREAD_KEYS' } )
) AS D
;

LOCATION (‘/connector/bucket.endpoint/[key_prefix]’)

connector
external storage system type identifier (for S3 data stores, the value is s3)

bucket
S3 bucket name

endpoint
S3 service endpoint based on bucket location constraint (Region) as listed at S3 Service Endpoints. For example, buckets created in region
us-east-1 will have an endpoint of s3.amazonaws.com

key_prefix
key/object qualifier

AUTHORIZATION (
'{
"Access_ID":"<ACCESS-KEY>"
,"Access_Key":"<SECRET-ACCESS-KEY>"
}'
)

  • IAM key details required to access the S3 bucket
  • NOTE: for PUBLIC buckets <ACCESS-KEY> & <SECRET-ACCESS-KEY>, would be passed as empty strings

RETURNTYPE ( { ‘NOSREAD_RECORD’ | ‘NOSREAD_KEYS’ } )

NOSREAD_KEYS
indicates that only records relating to details of object paths/keys are to be returned in the result

NOSREAD_RECORD
indicates records should be returned from contents of file(s)

READ_NOS Examples

A few examples of querying the private store created in section "Creating an S3 Store" will be presented in this section.

Inputs to READ_NOS examples are as noted during sample private store creation. The values, once again, are listed for convenience.

bucketname: my-private-surface-water-data-store
S3 service Endpoint: s3.us-west-1.amazonaws.com
AccessKeyId: AKIATZSC8AVA4PNC6VLC
SecretAccessKey: qpGYpDMFiKgPczxEJgKgxFO9cUhiiuqJFESl61Gw

These will need to be substituted into the READ_NOS syntax expression described in the previous section:

connector: s3
bucket : my-private-surface-water-data-store
endpoint : s3.us-west-1.amazonaws.com
<YOUR-ACCESS-KEY> : AKIATZSC8AVA4PNC6VLC
<YOUR-SECRET-ACCESS-KEY> : qpGYpDMFiKgPczxEJgKgxFO9cUhiiuqJFESl61Gw

Values have been substituted in examples that follow.

Alternatively, if using the Teradata-supplied public store, the values to be substitued are:

connector : s3
bucket : td-usgs-public
endpoint : s3.amazonaws.com
<YOUR-ACCESS-KEY> : ''
<YOUR-SECRET-ACCESS-KEY> : ''

Example 1: Listing Store Objects

Below shows RETURNTYPE ('NOSREAD_KEYS') being used to return the TOP 5 objects/keys.

SELECT TOP 5 location
FROM
READ_NOS (
USING
LOCATION ('/s3/my-private-surface-water-data-store.s3.us-west-1.amazonaws.com/')
AUTHORIZATION ('{"Access_ID":"AKIATZSC8AVA4PNC6VLC" ,"Access_Key":"qpGYpDMFiKgPczxEJgKgxFO9cUhiiuqJFESl61Gw"}')
RETURNTYPE ('NOSREAD_KEYS')
) AS D
ORDER BY 1;

Output:

Example 1, query output

Visual representation of the above would show the following folder structure:

s3://my-private-surface-water-data-store
└───
├── /CSVDATA
│ ├── /09380000
│ │ └── /2018
│ │ ├── /06
│ │ │ ├── /27.csv
│ │ │ ├── /28.csv
│ │ │ ├── /29.csv
│ │ │ ├── /30.csv
│ │ ├── /07
│ │ │ ├── /01.csv
...
...

The structure is derived from Location values returned in previous query.

Example 1, breakdown of Location

Example 2: Get the Total Number of Store Objects

Count of objects/files can be retrieved using:

SELECT 
COUNT(*) AS FILE_COUNT
FROM
(
SELECT *
FROM READ_NOS (
USING
LOCATION ( '/s3/my-private-surface-water-data-store.s3.us-west-1.amazonaws.com/' )
AUTHORIZATION ('{"Access_ID":"AKIATZSC8AVA4PNC6VLC"
,"Access_Key":"qpGYpDMFiKgPczxEJgKgxFO9cUhiiuqJFESl61Gw"}')
RETURNTYPE ('NOSREAD_KEYS') ) AS D
) T

Output:

Example 2, Query Output

Example 3: View Sample Records in a File

Recall the first Location observed in the result set from Example 1 which represents a file/object:

First row returned in resultset from Example 1

To retrieve the top 5 records from the file, use RETURNTYPE ('NOSREAD_RECORD') as follows:

SELECT TOP 5 *
FROM
READ_NOS (
USING
LOCATION ('/s3/my-private-surface-water-data-store.s3.us-west-1.amazonaws.com/CSVDATA/09380000/2018/06/27.csv')
AUTHORIZATION ('{"Access_ID":"AKIATZSC8AVA4PNC6VLC"
,"Access_Key":"qpGYpDMFiKgPczxEJgKgxFO9cUhiiuqJFESl61Gw"}')
RETURNTYPE ('NOSREAD_RECORD')
) AS D

Output:

Example 3 output

Note the following from the results:

  • READ_NOS returned the result set including column names
  • this is a direct result of the implicit value for READ_NOS parameter HEADER being set to TRUE and presence of a header record within the file

Verifying READ_NOS results

Use the AWS CLI, output from Example 3 can be verified by piping first 5 records of the file tostdout.

$  aws s3 \
--profile aws_acct_admin \
cp s3://my-private-surface-water-data-store/CSVDATA/09380000/2018/06/27.csv - \
| head -n5

Output:

Verify Example 3 output using AWS CLI

Conclusion

In summary:

  • NOS is a Teradata capability which allows end users to interact with an external data store, such as an S3 bucket
  • stores can be setup as publicly accessible or secured
  • READ_NOS is a Teradata operator which allows a user to run select statements against store contents
  • when authenticating to an S3 public using this operator, credentials are supplied as empty strings, i.e.
Access_ID = ''
Access_Key = ''
  • an example of a public S3 store supplied by Teradata can be found at s3://td-usgs-public
  • if attempting to execute READ_NOS examples in this article on a database version other than 17.10, be sure to check version specific documentation

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store