ETL Pipeline for Cryptocurrency Market Data on Google Cloud

Use Case

Cryptocurrency markets generate vast amounts of real-time data, including price fluctuations, trading volumes, and historical trends. Investors and analysts need clean, structured, and up-to-date data to make informed decisions. This project builds an automated ETL pipeline to collect, transform, and store crypto market data for analytics and visualization.

Objective

  • Automate data extraction from cryptocurrency APIs.
  • Perform data transformation, including currency conversion and statistical calculations.
  • Store the processed data in a scalable cloud-based warehouse.
  • Create real-time dashboards for monitoring market trends.

Steps Taken

1. Data Used

  • Source: API data from CoinGecko or CoinMarketCap.
  • Data Type: Time-series cryptocurrency data (prices, volume, market cap, etc.).
  • Preprocessing:
    • Normalized currency values (e.g., USD, EUR, BTC).
    • Calculated moving averages to identify trends.

2. Technology Stack

  • Python – Used for data extraction and transformation.
  • Google Cloud Platform (GCP):
    • BigQuery – Stores structured crypto market data for analysis.
    • Cloud SQL – Alternative storage option for structured datasets.
    • Cloud Functions – Automates API calls and ETL processes.
  • Looker Studio (Google Data Studio) – Interactive visualization dashboard.
  • Pandas & SQL – Data cleaning and querying.

3. Implementation Steps

  • Data Extraction:
    • Automated API requests to fetch real-time crypto market data.
    • Handled rate limits and API authentication.
  • Data Transformation:
    • Converted price values to multiple currencies.
    • Calculated 7-day and 30-day moving averages for trend analysis.
    • Removed duplicate and inconsistent records.
  • Data Loading:
    • Stored structured data in BigQuery for efficient querying.
    • Alternative storage in Cloud SQL for relational database needs.
  • Visualization:
    • Built real-time dashboards in Looker Studio.
    • Enabled filtering and historical analysis of cryptocurrency trends.

4. Results & Impact

  • Provided a fully automated ETL pipeline for crypto data collection.
  • Enabled real-time analytics on price trends and market behavior.

5. Challenges & Learnings

  • API Rate Limits: Implemented caching to optimize API calls.
  • Handling Missing Data: Used interpolation and rolling averages.