HUSSAIN MOHAMMAD
DATA ENGINEER
PROFILE
Data Engineer with 5+ years of experience building scalable data solutions
using Python, SQL, AWS and GCP. Specialized in designing ETL pipelines,
CONTACT implementing data governance, and optimizing data infrastructures that
reduced processing time by 40% and improved data quality by 30%.
+1 9723797412 Master's degree in Business Analytics with expertise in transforming
[email protected] complex data architectures into streamlined pipelines that deliver
actionable business insights.
Dallas, TX - United States
WORK EXPERIENCE
SKILLS
EDGIO (Contract - Knowac IT) FEB 2024 - PRESENT
Python United States
SQL (MySQL, Hive SQL, Data Engineer
Built real-time analytics platform using Apache Kafka and Spark,
PostgresSQL) processing 500GB+ daily data with 99.9% uptime.
AWS (Redshift, S3, MWAA) Architected data standardization framework that unified 7 disparate
systems, reducing reporting time by 65%.
GCP (Pub/Sub, BigQuery, Optimized SQL queries for data extraction, improving performance by
Dataflow) 40% and enabling near real-time KPI reporting.
Designed scalable data warehouse on AWS Redshift that supports
Apache Kafka
200+ concurrent users while maintaining sub-second query response.
Apache Spark Automated data cleaning workflows with Python and Pandas,
reducing manual processing by 85% and improving data quality by
Apache Airflow
40%.
Pandas & NumPy Engineered Tableau dashboards using complex data models that
increased stakeholder decision-making speed by 30%.
TensorFlow
ETL Pipeline Design
DAG Optimization USAA (Contract - Knowac IT) OCT 2023 - FEB 2024
United States
Data Modeling
Data Engineer
Salesforce SOQL
Implemented Collibra data governance framework that improved
Tableau data lineage tracking by 75%
Engineered the New Business Sales Extract pipeline, consolidating 12
LSTM & Random Forest
data sources into a unified S3 data lake
Data Warehousing Developed automated extract system for real-time policy data,
Collibra Data Governance increasing sales team efficiency by 28%
Optimized Hive SQL queries that processed 50TB datasets, reducing
Machine Learning execution time from hours to minutes
Real-time Analytics Built Python automation framework that eliminated 120+ hours of
manual data extraction work monthly
Generative AI
Data Quality Management
Kaléo (Remote, Contract - Knowac IT MAY 2023 - OCT 2023
LANGUAGES United States
Data Engineer
English (Fluent) Architected end-to-end data pipelines using Apache Airflow and
Hindi Redshift, maintaining 98.2% uptime for business-critical workflows.
Integrated Salesforce data with AWS using custom
SalesForcetoS3Operator, processing 2M+ daily records with zero data
loss.
Engineered incremental data loading system that reduced processing
time by 65% while ensuring data consistency.
ACADEMIC Built automated data quality monitoring that identified and resolved
95% of data issues before affecting downstream systems.
PROJECTS: Optimized database performance through schema redesign and
query tuning, improving report generation speed by 40%.
Driver Drowsiness System Created compatible database schemas that increased system
integration efficiency by 35% across multiple platforms.
Real-time Python pipeline
(30fps) DataFactZ JAN 2020 - OCT 2020
Big Data Engineer
94% drowsiness
detection accuracy Developed GCP Dataflow pipelines that processed 10TB+ data daily
from 15+ sources with 99.8% reliability.
NLP model for speech
Enhanced ML model accuracy from 87% to 93% by implementing
pattern alerts BigQuery ML with optimized feature engineering.
Reduced accident rates by Built real-time data pipeline on GCP that reduced dashboard refresh
78% latency from 4 hours to 5 minutes.
Created architecture comparison framework that accelerated cloud
service selection by 70%, saving $45K in annual costs.
Integrated generative AI capabilities into data processing workflows,
IKEA Data Warehouse enabling automated anomaly detection and pattern recognition.
Integrated 8 department
databases Digital Lync OCT 2018 - DEC 2019
Spark Engineer
Automated ETL processes
Engineered data processing workflows handling 200M+ daily
SQL-intensive pipelines records, reducing errors by 85%
85% faster reporting time Built ML pipeline with multiple algorithms improving prediction
40% improved data accuracy by 32% for client's churn model
Transformed legacy Cron schedules to NiFi-based orchestration,
consistency
reducing job failures by 75%
Delivered LSTM and Random Forest hybrid models improving project
completion time predictions by 42%
iNeuron AUG 2017 - OCT 2018
Data Analyst
Built network analysis system improving deployment efficiency by
40% through data-driven insights
Engineered data pipeline that reduced VoLTE drop ratio from 1% to
0.16%
Standardized 300+ POCs through data integration and quality control
measures
Developed data infrastructure for Voice over Wi-Fi feature supporting
330K day-one users
EDUCATION
Master’s of Science (Business Analytics) 2021-2023
Texas A&M University-Commerce - United States
GPA: 3.87 / 4.0