0% found this document useful (0 votes)
30 views11 pages

Shamee K Sharma - IR

The document is an internship summary report detailing a two-month virtual internship focused on data engineering using AWS. It outlines the objectives, tasks, skills acquired, and project deliverables, emphasizing the design and implementation of data pipelines and the use of various AWS services. The report concludes with reflections on the overall internship experience, highlighting the technical growth and industry insights gained.

Uploaded by

try.kushagra2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views11 pages

Shamee K Sharma - IR

The document is an internship summary report detailing a two-month virtual internship focused on data engineering using AWS. It outlines the objectives, tasks, skills acquired, and project deliverables, emphasizing the design and implementation of data pipelines and the use of various AWS services. The report concludes with reflections on the overall internship experience, highlighting the technical growth and industry insights gained.

Uploaded by

try.kushagra2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

`

SCHOOL OF COMPUTING SCIENCE AND ENGINEERING


GREATER NOIDA, UTTAR PRADESH
2024 – 2025

INDUSTRY INTERNSHIP
SUMMARY REPORT

AWS Data Engineering Virtual Internship Report

BACHELOR OF TECHNOLOGY

in

COMPUTER SCIENCE AND ENGINEERING

Submitted by

Shamee K Sharma (22SCSE1012596)

Vth Sem III Year

1
`

CERTIFICATE

I hereby certify that the work which is being presented in the Internship project

report entitled “ AWS Data Engineering Virtual Internship Report “in


partial fulfillment for the requirements for the award of the degree of Bachelor of
Technology in the School of Computing Science and Engineering of Galgotias University ,
Greater Noida, is an authentic record of my own work carried out in the industry.
To the best of my knowledge, the matter embodied in the project report has not been
submitted to any other University/Institute for the award of any Degree.

Shamee K Sharma (22SCSE1012596)

This is to certify that the above statement made by the candidate is correct and
true to the best of my knowledge.

Signature of Internship Reviewer Signature of Dean (SCSE)

2
`

TABLE OF CONTENTS

CHAPTER TITLE Page No.


Abstract 4
List of Figures & List of Tables

List of Abbreviations

1 Introduction 8
1.1 Objective of the Internship Project

1.2 Problem statement and research objectives of this Internship

1.3 Description of Internship Domain and brief introduction about


an internship organization

2 Internship Activities 9
2.1 Detailed description of tasks and responsibilities.

2.2 Daily/Weekly progress (students can provide a log or journal


of activities).

2.3 Skills or tools used (e.g., programming languages,


frameworks, software, etc.).

3 Learning Outcomes

3.1 Skills acquired (technical and soft skills).

3.2 Knowledge gained about the industry/domain.

3.3 Problem-solving or challenges faced during the internship and


how they were addressed.

4 Project/Work Deliverables

4.1 Details of the main project(s) or tasks completed.

4.2 Outcomes or results of the work done.

4.3 Links or attachments to work products (if applicable, e.g.,


reports, presentations, or code).

5 Conclusion

5.1 Reflections on the overall internship experience.


5.2 Internship certificate.

3
`

ABSTRACT

This report details the experiences and outcomes of a two-month virtual internship focused
on data engineering using Amazon Web Services (AWS). The internship encompassed the
design and implementation of data pipelines, data modeling, and the utilization of various
AWS services to manage and process large datasets. Key deliverables included the
development of scalable data solutions and the application of best practices in data
engineering.

The primary goal of the internship was to design, implement, and optimize data pipelines
capable of handling large and complex datasets. This included tasks such as data ingestion,
transformation, and storage, which are essential for enabling data-driven decision-making
in modern organizations. Leveraging AWS services such as S3 for storage, Redshift for data
warehousing, Glue for ETL processes, and Lambda for automation, the internship
emphasized building scalable and efficient data solutions.

A key aspect of the program was understanding and applying data modelling techniques to
ensure data integrity and efficiency. Participants were introduced to industry-standard
practices, including schema design, data partitioning, and query optimization. These
practices were implemented to address real-world challenges such as performance
bottlenecks and data security concerns.

The internship also highlighted the importance of adopting best practices in data
engineering, such as using IAM roles for secure access, employing serverless computing for
cost-effectiveness, and optimizing Spark jobs for large-scale data processing. The
deliverables included functional data pipelines and documentation that showcased a deep
understanding of the AWS ecosystem and its applications in solving business challenges.

By the end of the internship, participants had gained not only technical proficiency in AWS
tools but also valuable insights into the broader domain of data engineering. This experience
equipped them with the skills to build reliable, scalable, and efficient data systems, making
significant contributions to the field of cloud-based data management. The report
summarizes this transformative journey, emphasizing the practical applications of AWS
technologies and the critical lessons learned during the program.

4
`

LIST OF FIGURES

S. NO FIG. NO TITLE PAGE. NO

1 1 Tools and Technologies Used 6

2 2 Daily/Weekly Progress Summary 8

3 3 Skills Acquired During the Internship 10

4 4 Project Deliverables Overview 12

5
`

LIST OF ABBREVIATIONS

AWS Abbreviation Definition


EMR Amazon Web Services
RDS Elastic MapReduce
S3 Relational Database Service
SQL Simple Storage Service
NoSQL Structured Query Language
ETL Non-Structured Query Language
BI Extract, Transform, Load
Business Intelligence

6
`
CHAPTER 1

INTRODUCTION

CHAPTER 1: INTRODUCTION

1.1 Objective of the Internship Project

 The primary objective of this internship was to gain practical experience in data
engineering by designing and implementing data pipelines using AWS services. This
involved understanding data warehousing concepts, data modelling, and the
deployment of scalable data solutions in a cloud environment.

1.2 Problem Statement and Research Objectives

 With the increasing volume of data generated by businesses, there is a pressing need
for efficient data processing and analysis tools. The internship aimed to address this
challenge by developing data pipelines capable of handling large datasets, ensuring
data integrity, and enabling data-driven decision-making.

1.3 Description of Internship Domain and Organization

 The internship was conducted under the AWS Data Engineering Virtual Internship
program, facilitated by EduSkills Foundation in collaboration with AICTE. The
program focused on cloud-based data engineering, providing exposure to AWS tools
and services essential for building data infrastructure

CHAPTER 2

7
`
INTERNSHIP ACTIVITIES

2.1 Tasks and Responsibilities

 Designed and implemented analytical data platform solutions to facilitate data-driven


decisions and insights.
 Developed data schemas and managed internal data warehouses and SQL/NoSQL
database systems.
 Collaborated with cross-functional teams to extract, transform, and load data from
diverse sources using AWS big data technologies.
 Engaged in data model design, architecture discussions, and optimizations to enhance
data processing efficiency.
 Explored and utilized AWS services such as S3, Redshift, Lambda, and Glue to build
and maintain data pipelines.
 Participated in mentoring sessions conducted by industry experts to gain insights into
real-world data engineering challenges.

2.2 Daily/Weekly Progress


 Each week a module was completed in order to produce the desired output on time.
 Weekly progress was noted and improved in order to maintain the harmony of the
process.

2.3 Skills or Tools Used


 Programming Languages: Python, SQL
 AWS Services: S3, Redshift, EMR, RDS, Lambda, Glue
 Data Processing Frameworks: Apache Spark, Hive
 Data Modelling Tools: ERD tools
 Version Control: Git

CHAPTER 3

8
`
LEARNING OUTCOMES

3.1 Skills Acquired

 Proficiency in designing and implementing data pipelines using AWS services.

 Enhanced understanding of data warehousing concepts and data modelling


techniques.

 Improved programming skills in Python and SQL for data processing tasks.

 Experience with big data technologies and frameworks such as Apache Spark and
Hive.

 Development of soft skills including teamwork, communication, and problem-solving.

3.2 Knowledge Gained

 In-depth understanding of AWS cloud services and their applications in data


engineering.

 In-depth understanding of AWS data warehousing and data modelling .

 Complete knowledge of SQL and Python.

 Deep understanding of cloud-based data engineering concepts.

 Insight into data lifecycle management, including ingestion, transformation, and


storage.

 Practical experience in optimizing cloud-based data solutions for scalability.

CHAPTER 4

9
`
PROJECT/WORK DELIVERABLES

4.1 Details of the main project(s) or tasks completed.


 Developed an API extraction system to pull data from a website at regular
intervals.
 Built a robust system to authenticate, send requests, and parse the API
response into structured formats (e.g., JSON, CSV).
 Automated the data extraction process and scheduled periodic API calls to
update the data.
4.2 Outcomes or results of the work done.
 Improved data retrieval efficiency, reducing manual effort and increasing the
frequency of data updates
 Delivered real-time insights from the extracted data to support decision-
making processes.
 Scalable and Reliable Solutions:
The API extraction process was designed for scalability, ensuring that it can
accommodate growth in the data volume and complexity of the website's API
over time.
4.3 Links or attachments to work products (if applicable, e.g., reports, presentations, or
code).
 Documentation outlining the architecture, setup process, and data extraction
methodology.
 Presentation:
A concise presentation summarizing the project's objectives, implementation
strategy, results, and future scalability potential. This was shared with
stakeholders to demonstrate the value of the automated API extraction
solution.
 Repository with API extraction scripts and configuration files
(https://github.com/shamee12312/porject_aicte/tree/main)

CHAPTER 5

10
`
CONCLUSION

5.1 Reflections on the overall internship experience.


 The AWS Data Engineering Virtual Internship provided a comprehensive
learning experience in cloud-based data engineering. It not only enhanced
technical proficiency in AWS tools but also fostered problem-solving and
analytical skills. The opportunity to work on real-world challenges has been
instrumental in preparing for a career in data engineering.
 Technical Growth
The internship allowed hands-on exposure to various AWS services like S3,
Redshift, Glue, Lambda, and EMR, which are foundational for modern data
engineering workflows. The ability to work with tools like Apache Spark and
Python further enhanced my capacity to manage, process, and analyze large
datasets efficiently. Designing and optimizing ETL processes, a core part of
the program, helped me understand the intricacies of data ingestion,
transformation, and storage.
 Industry Insights
Through this internship, I gained valuable insights into the data engineering
domain and the best practices followed in the industry. I learned about the
significance of data-driven decision-making and the role of robust data
pipelines in achieving business objectives. Understanding how large
organizations use cloud platforms to scale and secure their data infrastructure
was an eye-opener.
 Overall Reflection
The AWS Data Engineering Virtual Internship was more than just a learning
opportunity—it was an experience that bridged the gap between academic
concepts and industry practices. By tackling real-world problems and
delivering tangible results, I have grown both professionally and personally.
This journey has solidified my interest in data engineering and affirmed my
commitment to contributing to the field.

11

You might also like