0% found this document useful (0 votes)
19 views26 pages

pypark_scala_spark

Seekho BigData Institute offers a comprehensive training program focused on Spark, Scala, and Python, emphasizing coding skills and placement opportunities. The curriculum includes live sessions, personalized mentorship, and over 400 coding problems across various topics. Key features also include recorded classes, practical exercises, and guidance on job application strategies.

Uploaded by

Rakesh Prasad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views26 pages

pypark_scala_spark

Seekho BigData Institute offers a comprehensive training program focused on Spark, Scala, and Python, emphasizing coding skills and placement opportunities. The curriculum includes live sessions, personalized mentorship, and over 400 coding problems across various topics. Key features also include recorded classes, practical exercises, and guidance on job application strategies.

Uploaded by

Rakesh Prasad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

SEEKHO BIGDATA

INSTITUTE
KAL KI SOCH AAJ KI SHIKSHA

DATA IS THE NEW OIL

Call us at : 9989454737
SPARK THE GAME
CHANGER
RDD Spark

DataFrame

WWW.SEEKHOBIGDATA.COM
Feedbacks
Feedbacks
Feedbacks
Feedbacks
SPARK
Scala Spark and Pyspark
and Spark SQL

400+
CODING
PROBLEMS

www.seekhobigdata.com
"We don’t believe in just teaching”.

“We believe in teaching with


Results."
KEY FEATURES
SB

01 LIVE WHITE BOARD SESSIONS

2 ONE TO ONE PERSONALISED


MENTORSHIP IN CASE OF DOUBTS

3 400 + WIDE VARIETY OF SPARK


CODING PROBLEMS

4 EVERY LIVE CLASS IS BEING RECORDED


AND KEPT IN PORTAL

5 PLACEMENT OREINTED INSTITUTE

6 OUR FOCUS IS PRIMARILY ON


CODING.
LAB SETUP
1. Intellij for Scala Spark
2.Pycharm for Pyspark
3.Databricks for both scala Spark and
pyspark and Spark SQL
4.Sbt as a Build Tool
Scala
1. DataTypes in Scala
2. Conditonal Statements in Scala
3. Loops in Scala
4. DataStructures in Scala
5. Arrays
6. Map
7. Set
8. Range
9. List
10. Tuple
11. Number/string manipulations
Functional Programming

1. What are Functions


2. First class Functions
3. Higher order Functions
4. Anonymous Functions
5. Currying Function
6. Partial Applied Functions
7. What are implicits
8. Closures
9. Scala Type System
10. Null, Nil, Nothing, None
11. Practical 50+ Problems
Scala-Oops

1. Class and Object


2. Constructor
3. Polymorphism
4. Encapsulation
5. Abstract Class
6. Access Modifiers
7. Design Patterns in Scala
8. Traits
9. Diamond Problem
10. Case Classes
11. Sealed Trait
12. Method Overloading && Overridin
13. Singleton Object
14. Companion classes
15. 40-50 Practical Problems
Python
1. Variables and Data Types
2. Control Structures and Loops
3. Operators
4. Exception Handling
5. Python Built-in Functions
6. Lists
7. Tuples
8. Sets
9. Dictionaries
10. Classes
11. Objects
12. Inheritance
13. Encapsulation
14. Polymorphism
15. Opening files
16. Prime Number
17. Reverse a Number
18. Palindrome
19. Square Root of a Number
20. Divisibility Rules
21. Missing Number
22. Reading files
23.Writing files
24. Closing files
25. Exception handling
Spark-core
1. What Is Apache SPARK?
2. What Is RDD?
3. MapReduce Vs Apache Spark?
4. How Data Is Stored In Spark?
5. What Is Immutability Of RDD?
6. What Is Resilient ?
7. Spark Session Spark Context ?
8. Parallelize()
9. Read CSV, TextFile
10. Evaluation
11. What is DAG
12. What is Lineage Graph
13. Map()
14. Filter()
15. Reduce() ReduceByKey()
16. GroupByKey
17. GroupByKey VS ReduceByKey
18.Repartition & Coalesce
19. SortByKey()
20. FlatMap()
21. Split
22. Mean()
23. Filter()
24. joins in Rdd
25. Filter()
26. Contains()
27. Parallelize
28. Spark Architecture
29. BroadCast Variables
30. Accumulators
31. Problems on RDD
DataFrames
1. Dataframes
2. Datasets
3. DataFrame vs Dataset
4. show()
5. Reader Api
6. ReadModes
7. Writer Api
8. Write Modes
9. Inferschema
10. Explicit Schema
11. DataTypes in Spark
12. Conditional Statements in Spark
13. When and Otherwise Filter
14. String Manipulation Functions
15. Number Manipulation Functions
16. Count()
18. Min()
19. Avg
20. Sum
21. Count
22. GroupBy Aggregations
23. Window Aggregations
24. Different Kind Joins
25. Different Kinds of Joins Stratagie
26. Log4j Mechanism
27. Different Ways of Debugging
28. Lead && Lag related Problems
29. Spark-SQL
30. Date Manipulation Functions
31. Number Manipulation Functions
32.Regex Expressions
33.Cluster Calculations
34. Practicals on Every Concept
35. 400+ Plus Coding exercises
Optimizations
1. Serialization
2. API selection
3. Using Broadcast Varaibles
4. Cache and Persist
5. ByKey Operation
6. Predicate PushDown
7. Broadcast Join
8. Partition and Bucket By
9. Garbage Collection Tuning
10. Level of Parallelism
Spark-issues
1. Out Of Memory Exceptions
2. Missing Data
3. Data Skewness
4. Spark Job Repeatedly Fails
5. Inferschema Issue
6. Slow Performance Issues
7. Memory Contention
8. Disk Contention
9. Broadcasting Large Data
10. Serialization Issue
11. Version Incompatibility Issue
12. Cluster Instability Issues
13. Small File Issue
14. Result Exceeds Driver Memory
15. Too Small And Large Partitions
SparkDeployement
1. What are build Tools
2. What is SBT Build Tool
3. What is Gradle Build Tool
4. What is Maven build Tool
5. What is Jfrog
6. What is JIRA Tool
7. Bitbucket
8. Github
9. Git Commands
10. How to build a Jar
11. Spark-submit
12. Parameters of Spark-submit
DataQualityChecks
1. Check for duplicates
2. Check for unique values in columns
3. Check for missing values
4. Find outliers
5. Schema validation
6. Correlations
7. Cross-field validation
8. Dependency check
9. Text pattern analysis
10. Categorical value distributions
SPARK UNIT-TESTING
1.Spark Unit Testing With Flat-Spec

2.Will Learn How to Build Unit Test Cases

3. Will Test Dataframe Functionalities.

4.Code Coverage Will be Explained.

5. 20+ Practical Exercies Will be Done On


Building Unit-Test-Cases And Testing and
Understanding Code Coverage.
For Grabbing Opportunities

1. Linkedin Optimization techniques


2. Naukri Optimization Techniques
3. Resume Building
Thanks

You might also like