Skip to content

End-to-end example on how to parse user-agent and write to snowflake table

Notifications You must be signed in to change notification settings

swidvey/user-agent-parse

Repository files navigation

Snowflake ETL


USERSTACK

______________________________________________________________________ ______________________________________________________________________

Summary:

This repo is an end-to-end example on how to parse user agent strings
using the USERSTACK API, automation through Cron and AWS EC2, and data insertion to 
a Snowflake table. This code also shows a basic email function for status updates. 


About USERSTACK:
USERSTACK.com is a great API if you are looking to parse out user agent
information. Userstack offers a real-time, easy-to-use REST API interface 
capable of parsing User-Agent strings to accurately detect device, browser 
and operating system information.

Documetation can be found: https://userstack.com/documentation




Limitations: 
The process requires user to create an USERSTACK account. Paid account For large parse 
This example requires snowflake access. 

Special note:
Paths were removed. To utilize this code in production you will need to point
to your cron path

It is assumed you have a data table with User agents already avaliable. 
As a test you can use the User Agent data example given by USERSTACK

"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"



Design

Snowflake data -> ec2 -> API call -> clean parsed data in ec2 -> load to SNowflake 



Files Name Descriptions:

user-agent-parse 
    config.yaml:
        This is a yaml configuration file

    initiate_app.sh:
        Preps program and creates venv and installs needed packages in EC2.
    
    main.py:
        This is the main parse script that turns data to dataframe and writes to snowflake
    
    requirements.txt:
        python lib requieremnts
    
    user_agent_bash.sh:
        This script is what is called in the cron task, and preps venv and runs main.py
    
    log_file.log:
        error log file

snowflake folder
    ddl.sql:
        Shows how to create a table in snowflake and defines table values    

cron folder:
    crontab.txt:
        Shows cron schedule and path to run .sh program


Steps:

1. git pull into EC2
2. Run in initiate_app.sh
3. set cron schedule


About

End-to-end example on how to parse user-agent and write to snowflake table

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published