0% found this document useful (0 votes)
2 views13 pages

Big_Data_Presentation

The project report on Big Data by Computer Science students at Satyam College outlines the definition, technologies, and tools related to Big Data, emphasizing its key characteristics known as the 5 Vs. It discusses the Hadoop framework, Hive for data warehousing, and the role of Scala and Apache Spark in data processing and analytics, particularly in the context of COVID-19 data analysis. The report concludes with the benefits, applications, and challenges of Big Data, along with acknowledgments for institutional support.

Uploaded by

redzeroo237
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views13 pages

Big_Data_Presentation

The project report on Big Data by Computer Science students at Satyam College outlines the definition, technologies, and tools related to Big Data, emphasizing its key characteristics known as the 5 Vs. It discusses the Hadoop framework, Hive for data warehousing, and the role of Scala and Apache Spark in data processing and analytics, particularly in the context of COVID-19 data analysis. The report concludes with the benefits, applications, and challenges of Big Data, along with acknowledgments for institutional support.

Uploaded by

redzeroo237
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

BIG DATA

A PROJECT REPORT BY COMPUTER SECIENCE STUDENTS


SATYAM COLLEGE OF ENGINEERING AND TECHNOLOGY
SUPERVISED BY INFOSYS
INTRODUCTION TO BIG DATA

• Definition: Software utilities for processing large, complex


datasets.
• Technologies: AI, IoT, Machine Learning, Deep Learning.
• Tools: Hadoop, Spark, NoSQL Databases.
KEY CHARACTERISTICS OF BIG
DATA (5 VS)
• Volume: Massive data size.
• Velocity: Real-time data processing.
• Variety: Structured & unstructured data types.
• Veracity: Reliability concerns.
• Value: Insights & business advantages.
HADOOP FRAMEWORK

• Core Components: HDFS (storage),YARN (resource


management).
• Modules: Hive (queries), Pig (MapReduce), HBase
(database).
• Use Cases: Data warehousing, analytics, machine learning.
HIVE

• Data warehouse on Hadoop.


• Runs SQL-like queries (HQL).
• Features: Scalability, indexing, compressed data compatibility.
SCALA PROGRAMMING

• Multi-paradigm language supporting object-oriented and


functional programming.
• Key Roles in Big Data: Apache Spark, Kafka Streams, Akka.
• Optimized for distributed systems and ETL workflows.
APACHE SPARK OVERVIEW

• Unified analytics engine for large datasets.


• Components:
• - Spark Core
• - Spark SQL
• - Spark Streaming
• - MLlib
• - GraphX
SYSTEM SPECIFICATIONS

• Software Requirements: MySQL, Hadoop, Hive, Sqoop,


Spark, JDK.
• Hardware Requirements: i5/i7 processor, 16 GB RAM, 500
GB storage.
IMPLEMENTATION OVERVIEW

• Problem Statement: COVID-19 data analysis for insights on


infections, deaths, and vaccinations.
• Tools: Hadoop ecosystem, Pyspark for data preprocessing
and analytics.
IMPLEMENTATION TASKS

• Insights generated using PySpark:


• - Total infections & deaths by continent.
• - Vaccination stats.
• - Date extractions & averages.
KEY ANALYSIS RESULTS

• Examples of queries executed:


• - Highest vaccinations per continent.
• - Monthly vaccination trends (January 2021).
• - Maximum deaths by region (Asia, Europe).
CONCLUSION

• Benefits of Big Data: Real-time processing, scalability,


actionable insights.
• Applications: Healthcare, finance, marketing, e-commerce.
• Challenges: Security, privacy, data quality, ethical use.
ACKNOWLEDGMENTS

• Institutional Support: Anna University, Infosys.


• Team Contribution: Internal and external examiners,
department faculty.

You might also like