AWS S3: Simple Data Mining Techniques

Methods Covered

Prerequisites

pip3 install awscli
first_name last_name email gender ip_address
aws s3 cp "MOCK_DATA.txt.gz" s3://test-bucket/

Input File Field Delimiters

Read Delimiter from a File into a Variable

c:\>set /P fielddelim=<fielddelim.txt

Disable cmd.exe File and Directory Name Completion

Continuation of Long Commands

AWS CLI: select-object-content

c:\>set /P fielddelim=<fielddelim.txt 
c:\>aws s3api select-object-content ^ --bucket test-bucket ^
--key MOCK_DATA.txt.gz ^ --expression-type SQL ^
--expression "select count(*) from s3object" ^
--input-serialization CSV={"FieldDelimiter"='%fielddelim%'},"CompressionType"="GZIP" ^
--output-serialization CSV={"FieldDelimiter"='%fielddelim%'} ^
"results.csv" && type "results.csv" && del "results.csv"
PS C:\>aws s3api select-object-content `
--bucket test-bucket ` --key MOCK_DATA.txt.gz `
--expression-type SQL ` --expression "select count(*)from s3object" ` --input-serialization `
"CSV={FieldDelimiter=`'`t`'},CompressionType='GZIP'" `
--output-serialization "CSV={FieldDelimiter=`'`t`'}" results.csv `
; if($?) {cat "results.csv"} `
; if($?) {del "results.csv"}
ubuntu:~$ aws s3api select-object-content \
--bucket test-bucket \
--key MOCK_DATA.txt.gz \
--expression-type SQL \
--expression "select count(*)from s3object" \
--input-serialization \
"CSV={FieldDelimiter=$(echo -e "'\t'")},CompressionType='GZIP'" \
--output-serialization "CSV={FieldDelimiter=$(echo -e "'\t'")}" \
/dev/stdout 2>&1
c:\>set /P fielddelim=<fielddelim.txt 
c:\>aws s3api select-object-content ^
--bucket test-bucket ^
--key MOCK_DATA.txt.gz ^
--expression-type SQL ^
--expression "select * from s3object limit 10" ^
--input-serialization CSV={"FieldDelimiter"='%fielddelim%'},"CompressionType"="GZIP" ^
--output-serialization CSV={"FieldDelimiter"='%fielddelim%'} ^
"results.csv" && type "results.csv" && del "results.csv"
PS C:\>aws s3api select-object-content `
--bucket test-bucket ` --key MOCK_DATA.txt.gz `
--expression-type SQL ` --expression "select * from s3object limit 10" `
--input-serialization ` "CSV={FieldDelimiter=`'`t`'},CompressionType='GZIP'" `
--output-serialization "CSV={FieldDelimiter=`'`t`'}" results.csv `
; if($?) {cat "results.csv"} `
; if($?) {del "results.csv"}
ubuntu:~$ aws s3api select-object-content \
--bucket test-bucket \
--key MOCK_DATA.txt.gz \
--expression-type SQL \
--expression "select * from s3object limit 10" \
--input-serialization \
"CSV={FieldDelimiter=$(echo -e "'\t'")},CompressionType='GZIP'" \ --output-serialization "CSV={FieldDelimiter=$(echo -e "'\t'")}" \ /dev/stdout 2>&1
aws s3api select-object-content ^
--bucket test-bucket ^
--key MOCK_DATA.txt.gz ^
--expression-type SQL ^
--expression "select _1,_3,_4 from s3object where _4 = 'Male' and _1 LIKE 'Ja%'" ^
--input-serialization CSV={"FieldDelimiter"='%fielddelim%'},"CompressionType"="GZIP" ^
--output-serialization CSV={"FieldDelimiter"='%fielddelim%'} ^
"results.csv" && type "results.csv" && del "results.csv"
aws s3api select-object-content `
--bucket test-bucket `
--key MOCK_DATA.txt.gz `
--expression-type SQL `
--expression "select _1,_3,_4 from s3object where _4 = 'Male' and _1 LIKE 'Ja%'" `
--input-serialization `
"CSV={FieldDelimiter=`'`t`'},CompressionType='GZIP'" `
--output-serialization "CSV={FieldDelimiter=`'`t`'}" results.csv `
; if($?) {cat "results.csv"} `
; if($?) {del "results.csv"}
ubuntu:~$ aws s3api select-object-content \
--bucket test-bucket \
--key MOCK_DATA.txt.gz \
--expression-type SQL \
--expression "select _1,_3,_4 from s3object where _4 = 'Male' and _1 LIKE 'Ja%'" \
--input-serialization \
"CSV={FieldDelimiter=$(echo -e "'\t'")},CompressionType='GZIP'" \
--output-serialization "CSV={FieldDelimiter=$(echo -e "'\t'")}" \
/dev/stdout 2>&1
_1 = first_name 
_2 = last_name
_3 = email
_4 = gender
_5 = ip_address

Further Reading

Amazon S3 Select and Glacier Select queries currently do not support subqueries or joins

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store