Abhinay Phone: +1 (469) 734-4139
Sr. Data Engineer E-Mail: abhinay3402@[Link]
PROFESSIONAL SUMMARY:
Around 10 years of hands-on experience as a Data Engineer with deep expertise in building scalable and
efficient data solutions across cloud platforms like Azure and AWS.
Strong proficiency in designing, developing, and optimizing end-to-end ETL/ELT pipelines using Azure Data
Factory, Azure Databricks, and Spark (PySpark, Scala).
Extensive experience working with Azure services such as Data Lake Storage, Synapse Analytics, Cosmos DB,
Event Hubs, HDInsight, Azure Functions, and Azure SQL.
Successfully implemented real-time data processing architectures using Azure Event Hubs, Stream Analytics,
and Kafka to enable business-critical analytics and reporting.
Built data lakes and enterprise data warehouses on Azure cloud, enabling centralized storage, access
control, and analytics across structured and unstructured datasets.
Proficient in developing and managing Slowly Changing Dimensions (SCD Type 1 and Type 2) for tracking
historical changes in dimension tables.
Designed and deployed modern data architecture frameworks supporting data ingestion, transformation,
and consumption for healthcare, banking, and financial sectors.
Worked on Azure Synapse Analytics notebooks and pipeline orchestration to build highly parallel and cost-
optimized data workflows.
Integrated Snowflake with Azure ecosystem to support high-performance data warehousing, data modeling,
and analytical workloads.
Built cloud-native solutions using Snowflake, Azure Blob Storage, and Azure Data Factory for seamless data
movement across multiple systems.
Strong experience in SQL, T-SQL, and complex query optimization across Azure SQL Database, PostgreSQL,
Teradata, and Snowflake.
Hands-on experience with Python for data manipulation, validation scripts, and automation within Azure
Data Factory and Databricks.
Designed scalable data models using star schema and snowflake schema for supporting OLAP and BI tools
like Power BI and Tableau.
Developed interactive and dynamic dashboards using Power BI, Tableau, and QuickSight to help
stakeholders with business insights and KPI tracking.
Strong understanding of data governance, data quality frameworks, and secure data design principles,
including encryption, masking, and access control.
Integrated Azure Cosmos DB with other services for storing semi-structured data and enabling scalable
NoSQL access for analytics.
Experience in batch and streaming data ingestion using Apache Kafka, Spark Streaming, and Azure IoT Hub
for real-time decision-making.
Built and maintained CI/CD pipelines using Azure DevOps, GitHub, and Jenkins for continuous deployment
and integration of data workflows.
Automated infrastructure provisioning and deployment using Terraform, ARM templates, and Azure DevOps
pipelines.
Worked closely with cross-functional teams including analysts, data scientists, and product owners to gather
requirements and translate them into technical solutions.
Experience in job orchestration using Apache Airflow and Control-M to manage, monitor, and schedule
complex data workflows.
Conducted performance tuning and capacity planning for Spark, Hive, and Synapse jobs to reduce execution
time and improve cost efficiency.
Strong experience in data migration projects involving Hadoop to Azure cloud, and PostgreSQL to
Oracle/SQL Server, ensuring zero data loss and integrity.
Applied advanced data wrangling and feature engineering techniques using Pandas, NumPy, and Spark SQL
to prepare data for analytics and ML.
Demonstrated ability to work in Agile environments with regular participation in sprint planning,
retrospectives, and backlog grooming to align technical goals with business outcomes.
TECHNICAL SKILLS:
Big Data Tools Hadoop, Hive, Apache Spark, Apache Kylin, Apache Nifi, PySpark, HBase, Kafka,
YARN, Sqoop, Impala, Oozie, Pig, Map Reduce, Zookeeper and Flume
Hadoop Distributions EMR, Cloudera, Hortonworks.
Cloud Services Azure- Azure Data Lake, Data Bricks, Azure Data Factory, Azure Monitoring, Active
Directory, Azure Synapse Analytics, Key Vault, SQL Azure, HD Insight and Azure
DevOps
AWS – EC2, S3, EMR, RDS, Glue, Presto, Lambda, Redshift, Athena, CloudFormation,
CloudWatch, DynamoDB, Kinesis, Step Functions, IAM, VPC, Route 53, SNS, SQS,
CloudTrail
BI and Data Visualizations Azure Synapse Analytics, Amazon QuickSight, ETL -Informatica, SSIS, Talend,
Tableau, and Power BI
Relational Databases Oracle, SQL Server, Teradata, MySQL, PostgreSQL, and Netezza
No SQL Databases HBase, Cassandra and MongoDB
Programming Languages Scala, Python, Golang, R and Java, GraphQL (query language)
Scripting Python and Shell scripting
Build Tools Apache Maven and SBT, Jenkins, Cloud build
Version Control GIT, SVN and Bitbucket
Operating Systems Unix, Linux, Mac OS, CentOS, Ubuntu, and Windows
Tools PUTTY, Putty-Gen, Eclipse, IntelliJ, and Toad
Certifications
• Microsoft Certified: Azure Data Engineer Associate (L527F3-18D329)
• Microsoft Certified: Power BI Data Analyst Associate (52C001-CSAF3C)
PROFESSIONAL EXPERIENCE:
Client: Green Dot Corporation, Austin, TX Aug 2024 to Present
Role: Sr. Data Engineer
Responsibilities:
Designed and deployed scalable ETL pipelines using Azure Data Factory and Azure Databricks to extract,
transform, and load large volumes of structured and semi-structured data into Azure Data Lake and Synapse
Analytics.
Designed and implemented end-to-end ETL pipelines to ingest, transform, and load structured and semi-
structured data across staging, ODS, and EDW layers.
Built scalable workflows in AWS and Informatica to process high-volume datasets, ensuring performance
and reliability.
Partnered with business and analytics teams to gather requirements and translate them into technical ETL
and data integration solutions.
Developed SQL and Python scripts for data validation, anomaly detection, and reconciliation between source
and target systems.
Optimized Redshift queries, schemas, and load strategies using distribution keys, sort keys, and compression
techniques.
Created incremental and full-load workflows to efficiently handle large-scale datasets and maintain
historical data.
Designed and deployed data pipelines supporting both batch and streaming data, enabling near real-time
analytics.
Built reusable templates, naming standards, and workflow patterns to keep ETL development consistent
across projects.
Conducted performance tuning of ETL jobs, reducing execution times and improving overall cost efficiency.
Implemented data quality checks, error handling, and logging frameworks to ensure accuracy and reliability
of pipelines.
Collaborated with data modelers and architects to align ETL structures with conceptual, logical, and physical
models.
Integrated data from multiple heterogeneous sources such as APIs, flat files, relational databases, and cloud
platforms.
Created lineage and mapping documentation for audit, reconciliation, and compliance reporting.
Supported dashboard and reporting teams by delivering curated and trusted datasets for Tableau and Power
BI.
Automated pipeline deployments using GitHub and DevOps CI/CD processes for smooth promotion across
environments.
Applied encryption and security best practices for sensitive data, ensuring compliance with governance and
regulatory standards.
Partnered with cross-functional stakeholders in Agile ceremonies to deliver sprint goals on time.
Mentored junior engineers in best practices for SQL tuning, ETL design, and cloud-based pipeline
development.
Actively contributed to long-term architecture discussions, focusing on scalability, automation, and cloud
adoption.
Played a key role in multiple enterprise initiatives, delivering end-to-end ETL solutions from requirement
gathering through deployment.
Environment: AWS Redshift, AWS S3, AWS Glue, AWS Lambda, Informatica PowerCenter, SQL, Python, Excel,
Tableau, Power BI, GitHub, Azure DevOps, OLTP, OLAP, PostgreSQL, MySQL, Snowflake, DBT, Kafka.
Client: Fifth Third Bank, Dallas, TX Jan 2023 to July 2024
Role: Sr. Data Engineer
Responsibilities:
Designed and deployed scalable ETL pipelines using Azure Data Factory and Azure Databricks to extract,
transform, and load large volumes of structured and semi-structured data into Azure Data Lake and Synapse
Analytics.
Built real-time data processing workflows using Azure Event Hubs, Azure Stream Analytics, and Databricks
streaming to enable timely fraud detection and risk analysis.
Created and managed Slowly Changing Dimension (SCD) Type 1 and Type 2 pipelines to maintain both
current and historical views of business data.
Developed dimensional data models with fact and dimension tables to support analytical workloads across
reporting platforms like Power BI and Tableau.
Optimized T-SQL queries and indexing strategies in Azure SQL Database to improve data retrieval
performance and database efficiency.
Leveraged Azure Synapse Analytics with notebooks, pipelines, and triggers to orchestrate end-to-end data
workflows supporting advanced analytics.
Integrated data from diverse sources including SQL Server, S3, PostgreSQL, Teradata, and Excel using
PySpark transformations and business rule validations.
Established a centralized Azure-based data lake architecture enabling advanced machine learning initiatives
and data science exploration.
Configured and tuned Azure Cosmos DB for performance by setting indexing policies, consistency levels, and
throughput to support high-volume application demands.
Implemented CI/CD pipelines using Azure DevOps for version-controlled and automated deployment of data
solutions across environments.
Built monitoring solutions using Azure Monitor and Log Analytics to proactively troubleshoot and ensure
data pipeline reliability and availability.
Collaborated with Power BI developers and business teams to deliver actionable insights through optimized
SQL queries and curated datasets.
Participated in Agile ceremonies including sprint planning, stand-ups, and retrospectives to ensure on-time
delivery of data engineering tasks.
Applied query tuning techniques in Synapse Analytics that reduced execution time and helped achieve cost
optimization by up to 10%.
Worked closely with cross-functional stakeholders to translate complex business requirements into technical
data models and reporting pipelines.
Environment: Azure Cloud, Azure Data Lake, Azure Data Factory, Azure Services, Azure SQL server, Azure data
warehouse, Azure DevOps, Snowflake, ETL, Kafka, Power BI, SQL Database, GitHub, Azure Databricks, Azure Cosmos
DB, SSIS, DBT, Golang, Python, MySQL, PostgreSQL, Erwin and Postman.
Client: Vanguard, Malvern, PA Jan 2020 to Dec 2022
Role: Data Engineer
Responsibilities:
Developed and maintained scalable ETL pipelines using Azure Data Factory and Azure Databricks to process
large datasets and support enterprise analytics initiatives.
Integrated Snowflake with Azure services like Blob Storage, Cosmos DB, and SQL Database to build a cloud-
native, high-performance data warehouse.
Designed and optimized data models in Azure SQL Database and Snowflake, supporting real-time analytics
and reporting for critical business operations.
Performed complex data transformations in Azure Synapse Analytics using Notebooks, Pipelines, and
trigger-based workflows to enable robust data processing.
Created CI/CD pipelines in Azure DevOps for automating data pipeline deployments and improving reliability
of data engineering processes.
Ingested and transformed data from diverse sources including Azure SQL, Data Lake, and Cosmos DB into
Snowflake using ADF and Databricks.
Built source-to-target mapping documents and dimensional data models (Star, Snowflake, transactional)
aligned with compliance and regulatory standards.
Applied T-SQL techniques for CRUD operations, query tuning, indexing, and partitioning to optimize
performance in Azure SQL-based workloads.
Enabled secure Snowflake access via Azure Active Directory integration for centralized user authentication
and role-based access control.
Implemented monitoring and performance tracking for Snowflake pipelines using Azure Monitor and
Snowflake’s native tools to ensure pipeline health.
Collaborated with infrastructure teams to improve network performance and cost optimization for
Snowflake workloads in Azure cloud.
Mentored junior engineers and provided best practices for Snowflake integration and Azure-based data
engineering solutions.
Scheduled and orchestrated pipelines using Apache Airflow, ensuring data workflows ran reliably and were
monitored effectively.
Implemented infrastructure automation and monitoring tools including Terraform, Jenkins, Docker, and
Ansible.
Participated in Agile ceremonies and used tools like Confluence and Jira for sprint planning, documentation,
and cross-team collaboration.
Environment: Azure Cloud, Azure Data Lake, Azure Data Factory, Azure Cosmos DB, Azure Services, Azure SQL
server, Azure data warehouse, Azure DevOps, ETL, Spark, Pyspark, scala, Snowflake.
Client: CVS Health, Buffalo Grove, IL Apr 2017 to Dec 2019
Role: Data Engineer
Responsibilities:
Designed and implemented scalable Azure-based data architecture tailored to healthcare payer claims,
ensuring compliance with HIPAA and privacy standards.
Built robust ETL pipelines using Azure Data Factory and Databricks to ingest, transform, and load high-
volume structured and semi-structured healthcare data.
Utilized Azure Data Lake Storage and Azure Blob Storage for secure, cost-efficient storage, implementing
partitioning and lifecycle policies to manage claim data effectively.
Developed real-time data ingestion pipelines using Azure Event Hubs and IoT Hub to collect and analyze
streaming data from medical devices and healthcare systems.
Optimized large-scale data processing workflows in Azure Databricks using Apache Spark for distributed
computing, ML preparation, and data cleansing.
Applied Azure Stream Analytics for real-time monitoring of critical patient vitals and hospital performance
metrics, enabling proactive decision-making.
Designed data models and schemas in Azure Synapse Analytics to support complex analytical queries,
reporting needs, and maintain data integrity.
Created insightful and interactive Power BI dashboards to visualize clinical trends, operational metrics, and
patient outcomes for stakeholders.
Integrated Power BI with Azure Analysis Services to enhance data modeling, improve report performance,
and support ad hoc analytics.
Developed secure pipelines with Azure Key Vault and Active Directory, enforcing data encryption, credential
management, and role-based access control.
Automated infrastructure provisioning and deployment of Azure resources using Azure DevOps, ARM
templates, PowerShell, and Python.
Contributed to real-time analytics pipelines using Spark Streaming with Kafka, applying backpressure
handling for stable message ingestion.
Collaborated with cross-functional teams—including data scientists, BI developers, and clinicians—to
translate healthcare business needs into scalable technical solutions.
Played a key role in data migration projects using GitHub and Jenkins integrations to move and validate
legacy healthcare data onto Azure platforms.
Developed and validated custom Python scripts for file integrity checks and transformation logic in
Databricks to support data quality workflows.
Environment: Azure Data Storage, Azure Data Factory, Azure Services, Azure SQL server, Azure data warehouse,
MySQL, ETL, Kafka, PowerBI, SQL Database, T-SQL, U-SQL, GitHub, Azure Data Lake, Azure Databricks, SSIS.
Client: Star TV Network, India Jun 2015 to Mar 2017
Role: Bigdata Developer
Responsibilities:
Collaborated with business analysts and product owners to gather and translate project requirements into
scalable Big Data solutions using Hadoop and AWS technologies.
Built data pipelines using Spark, PySpark, and Hive to transform raw data from Hive tables into clean
datasets used for reporting and analytics.
Developed and scheduled Sqoop jobs to extract data from Oracle, SQL Server, and flat files into the
enterprise Hadoop Data Lake.
Conducted performance tuning of Spark and Hive jobs by analyzing execution plans, YARN logs, and DAGs,
leading to significant runtime improvements.
Designed reusable Python modules and Jupyter notebooks for exploratory data analysis and efficient ETL
prototyping.
Migrated on-premises Hadoop clusters and ETL workflows to AWS Cloud, reducing infrastructure overhead
and improving scalability.
Created and optimized AWS Glue ETL scripts for transforming, flattening, and enriching semi-structured data
formats like JSON and Parquet.
Developed event-driven data pipelines using AWS Glue and Lambda to automate ingestion, transformation,
and data freshness monitoring.
Built external tables with partitions in Hive, Athena, and Redshift to improve query performance and enable
parallel data access.
Established AWS RDS and S3 environments, implemented backup strategies, and configured Redshift for
large-scale analytics workloads.
Designed and deployed Oozie workflows and Control-M schedules for orchestration, integrating job
execution with Airflow and AWS Glue.
Integrated Teradata with AWS EMR and built shell scripts to load data from Teradata staging to AWS data
marts, optimizing queries for performance.
Prototyped CI/CD pipelines using GitLab on Kubernetes and Docker for streamlined build, test, and
deployment of data engineering applications.
Created Lambda functions in Scala and Python to process source data, perform validation checks, and
manage AWS Glue Catalog metadata.
Documented technical processes, architecture diagrams, and data flow designs using Confluence for internal
knowledge sharing and project tracking.
Environment: AWS RedShift, AWS EMR, ELK, EC2, ETL, AWS Glue, S3, Athena, Hive, Impala, Lambda, Python,
PySpark, Spark SQL, Oracle 11g/12c, Teradata, Jira, Bitbucket, Power BI, Control-M, Airflow, Talend, CI/CD pipelines,
GKE, Redshift, SAS.
EDUCATION:
Bachelor’s degree in computer science and engineering, BGIT, India. 2010 to 2014