Skip to content

A data pipeline that extracts crypto market data from a REST API, transforms it by parsing relevant fields, and loads the cleaned dataset into a SQL database for further analysis or storage.

License

Notifications You must be signed in to change notification settings

r-o-bo/StockSync

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StockSync

A simple ETL data (Extract–Transform–Load) pipeline built in Python. It fetches Ethereum (ETH) price data from the Yahoo Finance API (via RapidAPI), transforms it using pandas, and loads it into a MySQL database for analysis.

Pipeline Diagram


📌 Features

  • Extract real-time crypto price data from the Yahoo Finance RapidAPI.
  • Transform raw API JSON into a clean, structured Pandas DataFrame.
  • Load the transformed data into a MySQL table (eth_prices).
  • Easy to extend for other cryptocurrencies or stock tickers.

⚙️ Tech Stack

  • Python 3.10+

  • MySQL (local or remote, tested with MySQL Workbench)

  • Libraries:

    • requests — API calls
    • pandas — data management
    • mysql-connector-python — database connection
    • python-dotenv — load secrets from .env

📂 Project Structure

ETLpy/
│── assets/
│   ├── ETLarch.png 
│── src/
│   ├── extract.py       # Fetch data from API
│   ├── transform.py     # Clean & format data
│   ├── load.py          # Create table + insert into MySQL
│   ├── pipeline.py      # Orchestrates the ETL flow
│── config/
│   └── database.py      # Database connection setup
│   └── api_config.py    # API configuration
│── .env                 
│── requirements.txt     
│── README.md
│── LICENSE
           

🔑 Get an API Key

  1. Sign up or log in at Yahoo Finance RapidAPI.
  2. Go to the price endpoint and copy the key.

Environment Variables

Set up a .env file in the project root:

API_KEY=your_rapidapi_key_here
HOST=localhost
MYSQL_USERNAME=root
MYSQL_PASSWORD=your_password
MYSQL_DATABASE=ur_db

🚀 How to Run

  1. Clone the repo

    git clone https://github.com/r-o-bo/StockSync
    cd ETLpy
  2. Install dependencies

    pip install -r requirements.txt
  3. Run the pipeline

    python -m src.pipeline
  4. Check your MySQL table

    USE crypto_eth;
    SELECT * FROM eth_prices;
  5. Analyze data as you prefer
    Example queries:

    -- Maximum Price recorded
    SELECT MAX(price) AS max_price
    FROM eth_prices;
    
    -- Average price across all rows
    SELECT AVG(price) AS avg_price
    FROM eth_prices;
    
    -- Price changes greater than 4.1%
    SELECT symbol, price_change, percent_change, transformed_at
    FROM eth_prices
    WHERE price_change > 4.1
    ORDER BY transformed_at DESC;
    
    

📊 Schema

id (PK) symbol name price price_change percent_change volume market_cap week_high week_low logo last_updated transformed_at

📜 License

This project is licensed under the MIT License.


About

A data pipeline that extracts crypto market data from a REST API, transforms it by parsing relevant fields, and loads the cleaned dataset into a SQL database for further analysis or storage.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages