0% found this document useful (0 votes)
897 views4 pages

Big Data Engineer Resume Overview

Vishwa has over 7 years of experience as a big data engineer with expertise in technologies like Apache Kafka, Spark, Hadoop, Hive, Cassandra, and Python. He has worked on building streaming applications, data pipelines, and data integration projects for clients in various industries. His technical skills include Apache Spark, Scala, Python, data warehousing, machine learning, and containerization tools like Docker.

Uploaded by

HARSHA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
897 views4 pages

Big Data Engineer Resume Overview

Vishwa has over 7 years of experience as a big data engineer with expertise in technologies like Apache Kafka, Spark, Hadoop, Hive, Cassandra, and Python. He has worked on building streaming applications, data pipelines, and data integration projects for clients in various industries. His technical skills include Apache Spark, Scala, Python, data warehousing, machine learning, and containerization tools like Docker.

Uploaded by

HARSHA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Vishwa | Sr. Big Data Engineer | vishwas.bigdata@gmail.

com| Phone: 469-567-0045

Overall 7+ years of experience in Software applications development including Analysis, Design, Development,
Integration, Testing and Maintenance of various big data applications using Scala and python languages Experienced
developing big data applications in cloud and on-premises platforms.

Technical Summary

 Experienced in building streaming applications using Apache Kafka, Spark Streaming and Other streaming platforms.
 Experienced in building highly scalable Big-data solutions using Hadoop and multiple distributions i.e., Cloudera,
Hortonworks, and NoSQL platforms (Hbase & Cassandra).
 Expertise in Big data architecture with Hadoop File system and its eco system tools MapReduce, HBase, Hive, Agile,
Pig, Zookeeper, Oozie, Flume, Avro, Impala, Apache spark and Spark Streaming and Spark SQL.
 Hands on experience in Apache Sqoop, Apache Storm and Apache Hive integration.
 Experience on multi cloud environments like Azure and Amazon Web services (AWS).
 Experience with different File Formats like Parquet, JSON, AVRO, ORC for Hive Querying and Processing.
 Developed Spark applications using Scala and Python for lot of ETL Operations and machine Learning algorithms.
 Experience in building end to end continuous integration and deployment using Jenkins.
 Familiarity with Containerizations and virtualization tools like Docker and Kubernetes.

Experience Summary

Client Entegris Inc Designation Sr. Data Engineer

Location Minneapolis, MN Duration Feb 21 - Present

 Responsible for Digital Transformation from the legacy BW to Cloud Datawarehouse.


 Worked on building different use cases for the data which was being manually maintained in different
manufacturing data warehouses.
 Core member of Cloud Analytics Selection Team for selecting a Cloud platform for which is suitable for many of
our organization use cases.
 Experienced in creating Dataproc and Dataflow clusters on GCP for running all the computations on Cloud.
 Developed CI/CD pipelines using Python, Spark and Spark-SQL for data extraction, transformation, pivoting and
aggregating to the specified format as per business requirement.
 Loading data every 15 min on incremental basis to Bigquery using Google DataProc, Pyspark, Gsutil and Shell
Script.
 Used Google Cloud Composer, Airflow for creating data pipelines automation from Cloud storage to a Bigquery.
 Experience in creating queries on Bigquery for different data sets and integrating that to Power BI for creating
dashboards.
 Using rest API with Python to ingest Data from and some other site to Bigquery.
 Created dashboard on Power BI for Data visualization and reporting for quarterly and monthly reports.
 Worked on a POC for integration of Snowflake to AWS for one of our data use case, created a demo data
pipeline using AWS S3 bucket, Glue and by using python transformations and ingested curated dataset to
Snowflake.
 Worked on POC with AWS Sagemaker for one of our ML and AI use cases and on GCP with Vertex AI for
Comparison between both the platforms.
 Expert in monitoring Bigquery, Dataproc and cloud Data flow jobs via Monitoring agent for GCP.
Client Target Corporation Inc Designation Sr. Data Engineer

Location Minneapolis, MN Duration Dec 19 – Present

 Developed external vendor file export pipelines using Spark, Hive, Python, Scala, and Shell scripting.
 Implemented optimized spark Scala data pipelines for aggregating large amounts of data.
 Worked on building on generating business reports from custom vendor data platform.
 Developed integrations pipelines for SFTP, Cloud storages like S3 and GCS.
 Developed Spark applications using Spark-SQL for data extraction, transformation and aggregating to a specified
format for transforming and analyzing the data to uncover insights into customer requested formats.
 Experienced in SQL, data transformations, statistical analysis and troubleshooting across more than one
Database Platform (MySQL, PostgreSQL, Teradata, and Azure SQL warehouse).
 Migrated existing data pipelines from Hortonworks Platform 2 to Hortonworks Platform 3.
 Implemented data pipelines automation using Oozie and internally open-sourced tools like automation portal.
 Implemented reporting layer on top of Apache Druid, for incrementally updates to business reports.
 Expert in building Hive optimized queries on top of large volumes of data in different data formats.
 Developed Continuous deployment process using container-based tools like Drone.
 Implemented Docker pipelines for testing and validation in integration and deployment process.
 Developed end to end unit testing and integration testing for data pipelines using PySpark.
 Developed daily metrics pipelines and exposed it through Grafana dashboard with alerting.

Client Samsung Electronics America Designation Sr.Data Engineer

Location Plano, TX Duration Mar 18 – Dec 19

 Implemented Real-time data pipelines for streaming analytics Using Kafka, Spark Streaming with Scala.
 Working on migrating on premise cluster data into Azure Cloud for implementing real-time features.
 Created Custom Dashboards Using Application Insights and Application Insights Query Language to process
metrics sent to AI and create dashboards on top of it in AZURE.
 Created real time streaming dashboards in Power BI using Stream Analytics to push dataset to Power Bi.
 Developed a custom message consumer to consume the data from the Kafka producer and push the messages
to service bus and event hub (Azure Components).
 Implemented Spark ETL jobs in Azure HD insights for ETL Operations in Cloud.
 Implemented CI-CD pipelines to build and deploy the projects in Hadoop environment using Jenkins.
 Implemented data platform in Hive data warehouse for on premise use and archival purpose.

Client United Airlines Designation Sr.Big Data Engineer

Location Chicago, IL Duration Oct 16 – Feb 18

 Extracted the data from Teradata & MySQL into HDFS using Sqoop export/import.
 Developed Sqoop jobs with incremental load to populate Hive External tables.
 Expertise in using design patterns in Map Reduce to convert business data into custom format.
 Experienced with handling different compression codec's like LZO, GZIP, and Snappy.
 Expert in optimizing performance in hive using partitions and bucketing concepts.
 Experience on working hive dynamic partition to overcome hive locking mechanism.
 Developed UDFs in Java as and when necessary to use in HIVE queries.
 Developed crontab for scheduling and orchestrating the ETL process.
 Involved in indexing hive data using Solr and prepare custom tokenizer formats for querying.
 Involved in designing a real time computation engine using Kafka.
 Worked on POC to set up spark streaming data to Solr and perform indexing on it.
 Experienced with writing build jobs using Maven and integrate that with Jenkins.
 Ingested data from AWS cloud buckets for third party data.

Client BCBS Designation Big Data Engineer

Location Baltimore, MD Duration Jul 15 - Sep 16

 Developed oozie automations using custom MapReduce, Pig, Hive, Sqoop.


 Built reusable Hive UDF libraries for business which enables users to reuse.
 Expertise in performance tuning on Hive queries, Joins and different configuration parameters to improve query
response time.
 Created Partitions, Buckets based on state to further process using Bucked based Hive joins.
 Used Cassandra CQL with Java API’s to retrieve data from Cassandra table.
 Developed applications on spark as part of Next gen platform implementation.
 Implemented Data Ingestion in real time processing using Kafka.
 Developed Data pipeline using Kafka and Storm to store Data into HDFS.
 Used Apache Maven extensively while developing MapReduce program.
 Extensively worked on PIG Scripts and Pig UDF’s to perform ETL activities.
 Developed spark scripts using Python.
 Developed workflow in Oozie to automate the tasks.
 Collected Logs data from web servers and loaded into HDFS using Flume.

Client Health Integrated Designation Hadoop Developer

Location Tampa, FL Duration Nov 14 – Jun 15

 Understand the exact requirement of report from the Business groups and users.
 Imported trading and derivatives data in Hadoop Distributed File System using Eco System components
MapReduce, Pig, Hive, Sqoop.
 Responsible writing Hive queries and PIG scripts for data processing.
 Running Sqoop for importing data from Oracle and another Database.
 Created of shell scripts to collect raw logs from different machines.
 Created Hive as static and dynamic partitions.
 Optimized script using illustrate and explain and used parameterize Pig Script.
 Defined some PIG UDFs for some functions such as swap, hedging, Speculation and arbitrage.
 Unstructured logs files are coded using MapReduce program.
 Imported and exported data into HDFS and Hive using Sqoop.
 Involved in the process of configuring HA, Kerberos security issues and name node failure restoration activity
time to time as a part of zero downtime.
 Developed JUNIT test cases for application unit testing.
 Used SVN as version control to check in the code, created branches and tagged the code in SVN.
Client AdvanSoft International Designation Java/J2EE Developer

Location India Duration Jun 13 – May 14

 Developed the modules based on Struts MVC Architecture.


 Developed business components using Core Java concepts and classes like Inheritance, Polymorphism,
Collections, Serialization and Multithreading etc.
 Developed the Web Interface using Servlets, Java Server Pages, HTML and CSS.
 Developed the DAO objects using JDBC.
 Used Spring Framework for Dependency injection and integrated with the Struts Framework and Hibernate.
 Used Log4j to capture the log that includes runtime exceptions, monitored error logs, and fixed the problems.
 Performed Unit Testing, System Testing, and Integration Testing.
 Provided technical support for production environments resolving the issues, analyzing the defects, providing,
and implementing the solution defects.

You might also like