0% found this document useful (0 votes)
133 views5 pages

Senior Data Engineer Profile Summary

Uploaded by

sekharreddy82p
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
133 views5 pages

Senior Data Engineer Profile Summary

Uploaded by

sekharreddy82p
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

LEKHYA JANA

SR. DATA ENGINEER


EMAIL: [email protected] CONTACT: +17378008233
LINKEDIN: https://www.linkedin.com/in/lekhya-j/

PROFESSIONAL SUMMARY:

• Demonstrated 8+ years of professional work experience in design, development, and implementation of cloud, Big
Data, Spark, Scala, Hadoop, and maintenance of data pipelines in the position of Data Engineer.
• Strong experience in developing Data Modeling, Data migration, Design, Data Warehousing, Data Ingestion, Data
Integration, Data Consumption, Data delivery, and integration Reporting.
• Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to
various business problems and generating data visualizations using R, Python.
• Knowledge about model training, optimization, and evaluation methods using TensorFlow and PyTorch.
• In-depth knowledge in OLAP, OLTP, Business Intelligence and Data Warehousing concepts with emphasis on ETL and
Business Reporting needs.
• Experience in Extract, Load and Transformation (ETL) process using Informatica PowerCenter, DBT and SSIS. from
Sequential files, XML files, CSV files and transformed and loaded into the target Data warehouse.
• Fluent in utilizing cloud-native technologies such as AWS, and Azure to build scalable and secure data platforms.
• Proficiency in utilizing Spark transformations and actions to clean, transform, and pre-process data, ensuring data
quality and consistency for downstream analysis.
• Experience in Designing and implementing Kafka based data pipelines, enabling real-time ingestion of vast data
volumes from diverse sources like databases, log files, and IoT devices.
• Experience in Microsoft Azure Cloud Services SQL Data Warehouse, Azure SQL Server, Azure Databricks, Azure
Data Lake, Azure Blob Storage, Azure Data Factory, and Azure DevOps.
• Knowledge on managing Snowflake accounts, warehouses, and user roles, ensuring proper resource allocation,
monitoring, and maintenance of Snowflake environments.
• Experience in implementing data replication and backup strategies in Snowflake to ensure data availability and disaster
recovery preparedness.
• Actively collaborated with data architect/engineers to establish data governance for MDM/security (key vault, network
security schema level & row level), resource groups, integration runtime setting, integration patterns, aggregated
functions for databricks development.
• Extensive experience in storing and documentation using NoSQL Databases: MongoDB, Snowflake, and HBase.
• Proficient in PL/SQL development with hands-on experience in Oracle 19, adept at writing efficient and optimized SQL
queries, stored procedures, functions, and triggers.
• Hands on experience in converting complex RDMS (Oracle, MySQL & Teradata) queries into Hive query language.
• Developed Spark applications for both batch and real-time data processing scenarios, accommodating various data
velocity requirements. Responsibility in designing and building Data Lake using Hadoop and its ecosystem
components.
• Followed Data Governance best practices by documenting data sources, metadata, and data lineage within Tableau to
maintain data transparency and lineage tracking.
• Hands-on experience with Amazon EC2, S3, Redshift, RDS, VPC, IAM, Amazon Elastic, Load Balancing, Auto
Scaling, and other services of the AWS family. Proficient in data mining tools like R, SAS, Python, SQL Alchemy.
• Experience in integrating Talend to transform and enrich data as it moves through the ETL pipeline, ensuring data
quality and compatibility with downstream systems.
• Hands-on experience in setting up a workflow using Apache Airflow and Oozie workflow engine for managing and
scheduling Hadoop jobs. Experience in transforming and retrieving the data by using Hive, Spark, and Map Reduce.
• Extensive exposure to containerization technologies, including Docker, Kubernetes, and Cassandra and proficient in
container orchestration using Kubernetes.
• Data importing and exporting by using Sqoop, Apache flume among HDFS, Relational Database Systems, HBase, web
applications, and vice versa. Strong in writing SQL queries and optimizing the Teradata, Oracle, and SQL Server
queries.
• Strong experience in working with UNIX environments and writing Shell Scripts for file system management, process
control, user administration, and networking tasks.
• Experience in managing and maintaining high-performance Kafka clusters, ensuring data availability, reliability, and
low-latency ingestion for mission-critical applications.
• Extensive experience in working with SQL, databases, ETL, and cloud platforms to achieve business objectives.
• Experience with visualizing the data using BI and services amp tools: Power BI, Tableau, Plotly, SSRS, and Matplotlib.
• Proven ability to effectively engage with business users and stakeholders to acquire data needs and explain design
decisions Applications.
• Knowledge in developing comprehensive UAT test cases based on business requirements, functional specifications, and
user stories. Maintained awareness of industry trends and developing technologies in data engineering and cloud
computing, as well as aggressively explored chances for skill development.
TECHNICAL SKILLS:

Cloud Services AWS Glue, S3, RedShift, EC2, S3, EMR, Dynamo DB, Data Lake, AWS Lambda,
GCP, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics,
Azure Analytical Services, Azure SQL Datawarehouse
Big Data Technologies Hadoop, Spark, Kafka
Visualization Tools Power BI, Tableau, Excel, Pivot Table, VLOOKUP, SSRS
ETL/ Datawarehouse Tools Informatica PowerCenter, Talend, DBT and SSIS.
Database Management Backup and Recovery, Snowflake, SQL DB’s
Version Control & GIT, Bitbucket, Docker, and Jenkins
Containerization tools
Real-Time Data Processing Kafka, Apache NiFi
Programming Languages SQL, NoSQL, PL/SQL, Python, Java, UNIX, PySpark, Pig, HiveQL, Scala,
R, Shell script
Databases MySQL, DB2, Oracle, PostgreSQL, MongoDB, Cassandra and Cosmos DB
Azure Agile, Waterfall

PROFESSIONAL EXPERIENCE:

Client: AgFirst, Columbia, SC Aug 2020 – Present


Role: Sr Data Engineer
Responsibilities:
• Analysed employee insurance claims data to identify trends, patterns, and anomalies, providing insights into claim
approval rates, common medical procedures, and cost distributions.
• Worked in an Agile environment used Jira to track and manage user stories, epics, and tasks throughout the
development lifecycle, and actively taken part in daily stand-ups, backlog grooming, and sprint planning.
• Designed, developed, and maintained AWS-based data solutions that are scalable, reliable, and cost-effective.
• Written various Spark programs in Java for data extraction, transformation, & aggregation from multiple file formats,
including XML, JSON, CSV, and other compressed file formats, and stored the refined data in partitioned tables in
EDW.
• Applied AWS Glue for preprocessing data before transformations in DBT, employing dynamic frame transformations
with PySpark and SparkSQL to flexibly handle semi-structured and structured data.
• Integrated Data Build Tool (DBT) into the AWS data pipeline architecture to orchestrate data transformations and
modeling tasks, leveraging its modular approach and SQL-based workflows.
• Developed custom migration scripts and ETL workflows using AWS Glue and Apache Spark to extract, transform, and
load data from Teradata into Snowflake, ensuring data integrity and consistency.
• Implemented metadata management processes to capture and maintain metadata, enhancing data governance and
traceability, while also optimizing PySpark jobs to run on Kubernetes Cluster for faster data processing.
• Integrated AWS services with on-premises data sources and third-party applications, creating seamless data flows
between cloud and on-premises environments using AWS DataSync and AWS Direct Connect.
• Implemented IAM policies to define granular permissions and access controls ensuring secure authentication and
authorization for AWS resources and services.
• Developed and optimized Snow SQL queries and procedures, as well as user-defined functions (UDFs) using
JavaScript, Python, and SQL to ensure efficient data processing in Snowflake.
• Seamlessly integrated EMR with data lakes to process various data types and formats within the AWS ecosystem.
Additionally, connected NoSQL databases with big data tools Hadoop and Spark for smooth data processing and
analysis.
• Set up the CI/CD pipelines using Maven, GitHub, and AWS. Worked extensively with importing metadata into Hive
using Python and migrated existing tables and applications to work on AWS cloud (S3).
• Successfully implemented ETL solutions between an OLTP and OLAP database in support of Decision Support
Systems with expertise in all phases of SDLC.
• Developed and maintained APIs for accessing and manipulating data using Fast API or Django REST Framework tool.
• Utilized data analysis techniques to identify potential instances of insurance fraud within employee claims, conducting
anomaly detection and pattern recognition to flag suspicious activities.
• Developed and maintained data processing solutions that leverage server less architectures using tool AWS Lambda.
• Developed a Data pipeline using Spark, Hive, Impala, and HBase to analyse customer behavioural data and financial
histories in the Hadoop cluster.
• Created scripts to append data from temporary HBase tables to target HBase tables in Spark and written Spark programs
in Scala and ran Spark jobs on YARN.
• Performed the DB2 bind process for various DB2 database groups across LPARS across all the applications.

• Implemented real-time data streaming solutions using Amazon Kinesis, including Kinesis Data Streams, Kinesis Data
Firehose, and Kinesis Data Analytics, to ingest, process, and analyse large volumes of streaming data in real-time.
• Processed different kinds of data, including unstructured (logs, clickstreams, Shares, likes and topics), semi-structured
(XML, JSON), and structured like RDBMS.
• Created, tested, and deployed Teradata Fast Load, MultiLoad, and BTEQ scripts, along with DML and DDL operations
and diagnosed and resolved Teradata system issues promptly to minimize downtime.
• Monitored and managed ElastiCache clusters using CloudWatch metrics, alarms, and logs for performance
optimization, capacity planning, and troubleshooting.
• Designed and implemented event-driven architectures using Amazon SNS (Simple Notification Service) and Amazon
SQS (Simple Queue Service) to enable scalable and decoupled communication between microservices and applications.
• Developed serverless workflows and orchestrated complex business processes using AWS Step Functions.
• Implemented fault-tolerant mechanisms to ensure the resilience of NoSQL databases in the face of hardware failures or
network issues.
• Implemented data modeling techniques in Tableau to create optimized data models for efficient data analysis and
visualization. Also leveraged data modeling tool Erwin for effective metadata management and documentation.
• Experienced data architecture best practices, integration, and data governance solutions (Data Catalog, Data Governance
frameworks, Metadata and Data Quality).

Client: Ascena Retail Group, Mahwah, NJ Nov 2018 – Jul 2020


Role: Data Engineer
Responsibilities:
• Designed and implemented distributed systems using Azure cloud technologies. Developed monitoring and alerting
systems to detect and resolve issues with data platforms and pipelines.
• Built and maintained efficient ETL processes for ingesting data from external sources into Azure Systems using a
combination of Azure Data Factory, T-SQL, and U-SQL Azure Data Lake Analytics.
• Created and maintained Azure SQL databases and Azure Data Lake Storage accounts, ensuring data is organized,
optimized, and secured. Also worked on Azure Data Lake Storage for efficient data storage and retrieval.
• Designed and implemented data pipelines using Azure Data Factory to move and transform data from various sources to
Azure storage solutions. Also conducted code reviews and unit tests to ensure quality and reliability of data processing
applications and pipelines.
• Utilized Azure DevOps for version control, continuous integration, and continuous deployment to streamline the
development and deployment of data solutions.
• Written Spark applications using Scala on Azure Databricks to interact with the MySQL database using SparkSQL
context and accessed Hive tables using Hive Context.
• Developed and optimized T-SQL queries and stored procedures in Azure Synapse SQL Pools for complex data
transformations, aggregations, and analytics.
• Designed and built reusable REST API frameworks to help with data consumption from and push into MongoDB,
increasing data accessibility and integration possibilities.
• Involved in POC to check the efficiency of Spark application on Mesos cluster and Hadoop Yarn cluster.
• Developed Spark scripts and created data pipeline with data frames to migrate data to Hadoop platform.
• Designed and implemented data security measures to protect sensitive data and prevent unauthorized access.
• Developed & maintained data pipelines to move & transform data across systems and platforms.
• Utilized Snowflake's data sharing capabilities to share data with external organizations and partners, fostering
collaboration and data-driven decision-making.
• Implemented effective indexing strategies to improve query performance and enhance overall database efficiency.
• Integrated SQL databases with other data storage systems and tools, data lakes and NoSQL databases, to facilitate
seamless data access and data flow within the data ecosystem.
• Optimized Snowflake for performance by leveraging features like automatic clustering, materialized views, and query
optimization, resulting in efficient data processing.
• Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different
sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
• Proficiently used Azure DevOps Server & Azure DevOps Service for version control, automation, & release
management.
• Utilized Python for data cleaning, handling missing values, outlier detection, and feature engineering for data analysis
and data modeling.
• Worked on designing, implementing and maintaining MySQL, Oracle, PostgreSQL, and MongoDB databases.
• Provided training and support to business users, ensuring they could navigate and utilize Tableau dashboards and
reports effectively to derive insights.
• Developed data quality checks and data validation processes to ensure accuracy, completeness, and consistency of data.
• Developed automated reporting solutions in Tableau Server to deliver scheduled and on-demand reports to stakeholders,
improving data accessibility and timely decision-making.
• Implemented data governance policies and procedures to ensure compliance with data privacy regulations and company
policies.

Client: Micron, Hyderabad, India Mar 2017 – Jul 2018


Role: Data Analyst
Responsibilities:
• Implemented complex business rules by creating robust Mappings, Mapplets, Sessions and Workflows using
Informatica Power Centre. Mapped complex data structures successfully, permitting easy data integration procedures.
• Extensive experience in Extraction, Transformation, and Loading (ETL) data from various data sources into Data
Warehouse and Data Marts using Informatica PowerCenter tools (Repository Manager, Designer, Workflow Manager,
Workflow Monitor, and Informatica Administration Console).
• Proficiently used SQL to extract, transform, and load (ETL) data from various sources, including relational databases,
data warehouses, and external APIs, ensuring data quality and consistency.
• Conducted performance tuning in PL/SQL by optimizing SQL queries, indexing, and query execution plans, resulting in
enhanced database performance and faster data retrieval.
• Implemented version control system Git for data analysis projects, ensuring efficient tracking of changes, collaboration
with team members, and maintaining a historical record of data transformations and analyses.
• Integrated Python applications with RESTful APIs to retrieve and update data from external sources and services,
enabling real-time data synchronization.
• Enforced data governance policies to ensure compliance with data quality standards, security, and privacy regulations.
• Designed and executed a comprehensive migration strategy to transition Informatica-based data integration processes
and workloads to AWS, ensuring a seamless transition while minimizing downtime and mitigating potential risks.
• Designed stage jobs to pick the data files from S3 to redshift staging layer.
• Worked with Informatica ETL tool to develop SCD I and SCD II mapping on Cloud in AWS for POC to build source to
Target mapping. Also worked on create Python Script to start and stop the EC2 instances in AWS.
• Designed transformation jobs from stage table to Redshift target tables as per the requirement for TYPE I and Type Il
• Developed Excel macros and VBA (Visual Basic for Applications) scripts to automate repetitive tasks, enhancing
efficiency in data processing and reporting workflows.
• Utilized VLOOKUP to perform data validation and cleansing tasks, identifying, and handling discrepancies, duplicates,
and missing data, contributing to data quality enhancement.
• Leveraged Python libraries Matplotlib, Seaborn, and Plotly to create data visualizations and reports for data analysis and
presentation to stakeholders.
• Developed data models in Power BI to enable efficient data exploration, aggregation, and filtering, enhancing the
overall performance of reports and dashboards.

Client Merck Pharma, Mumbai, India May 2015 – Feb 2017


Role: Data Analyst
Responsibilities:
• Developed predictive models for healthcare applications, such as disease prediction and patient risk stratification, using
machine learning frameworks scikit-learn and TensorFlow, and evaluated model performance.
• Implemented Agile principles and methodologies for data analysis projects, ensuring flexibility, collaboration, and
responsiveness to changing business needs.
• Proficiently used SQL to extract and retrieve relevant healthcare data from complex databases, including Electronic
Health Records (EHR) and Health Information Systems (HIS).
• Designed, developed, and maintained ETL (Extract, Transform, Load) packages in SSIS to efficiently extract data from
various sources, transform it to meet business requirements, and load it into SQL Server databases.
• Utilized SQL to clean and preprocess healthcare datasets, ensure data accuracy and consistency, and preparing them for
analysis. Additionally migrated Oracle server database systems to SQL Server.
• Designed and maintained SQL stored procedures to automate complex data processing tasks, such as data aggregation.
• Conducted in-depth data analysis using SQL, including creating complex queries and aggregations to identify trends,
patterns, and anomalies in patient demographics, clinical outcomes, and medical procedures.
• Monitored and managed EDI transactions, ensuring timely and accurate data exchange between systems, and promptly
resolved any issues or discrepancies.
• Created Entity-Relationship Diagrams (ERDs) to visually represent the structure of databases and their relationships.
• Applied normalization techniques to ensure data integrity and denormalization for optimizing query performance.
• Proficiently used Python library Pandas to clean, reshape, and preprocess healthcare data, handling missing values, and
converting data into suitable formats for.
• Designed and built interactive dashboards in Excel to visualize data trends and patterns, making complex information
more accessible and actionable for stakeholders.
• Forecasted sales patterns and demand for pharmaceutical products using statistical models and machine learning
approaches.
EDUCATION DETAILS: Bachelor of Technology in Computer Science at Jawaharlal Nehru Technological University
July 2011 – May 2015, Hyderabad, India.

You might also like