Courses
Tutorials
Practice
Data Structure
Java
Python
HTML
Interview Preparation
DSA
Practice Problems
C
C++
Java
Python
JavaScript
Data Science
Machine Learning
Courses
Linux
DevOps
SQL
Web Development
System Design
Aptitude
GfG Premium
Similar Topics
Web Technologies
37.4K+ articles
DSA
22.8K+ articles
Python
21.3K+ articles
Experiences
16.6K+ articles
AI-ML-DS
4.7K+ articles
Machine Learning
2.9K+ articles
python
1.2K+ articles
ML-Clustering
25+ articles
Apache-spark
11+ articles
Apache Spark
1+ articles
Python-Pyspark
179+ posts
Recent Articles
Popular Articles
How to introduce the schema in a Row in Spark?
Last Updated: 28 April 2025
The type of data, field names, and field types in a table are defined by a schema, which is a structured definition of a dataset. In Spark, a row's structure in a data fra...
read more
Data Science
Picked
python
Python-Pyspark
Query HIVE table in Pyspark
Last Updated: 28 April 2025
Hadoop Distributed File System (HDFS) is a distributed file system that provides high-throughput access to application data. In this article, we will learn how to create a...
read more
Python
Picked
Python-Pyspark
PySpark UDF of MapType
Last Updated: 28 April 2025
Consider a scenario where we have a PySpark DataFrame column of type MapType. Keys are strings and values can be of different types (integer, string, boolean, etc.). I n...
read more
Python
Picked
Python-Pyspark
Identify corrupted records in a dataset using pyspark
Last Updated: 07 April 2025
There can be datasets that may contain corrupt records. Those records don't follow data-specific rules that are followed by correct records e.g., a corrupt record may have...
read more
Python
Apache Spark
Apache-spark
Python-Pyspark
Sorting an array of a complex data type in Spark
Last Updated: 28 April 2025
We can use the sort() function or orderBy() function to sort the Spark array, but these functions might not work if an array is of complex data type. For such complex data...
read more
Python
Picked
Geeks Premier League
Python-Pyspark
Geeks Premier League 2023
Ranking Duplicate Values of a Column in Incremental Order in PySpark
Last Updated: 11 July 2024
In data processing, it is often necessary to rank or order the values within the column especially when dealing with the duplicate values. The Ranking duplicate values in ...
read more
Picked
Blogathon
Python-Pyspark
Data Analysis
AI-ML-DS
AI-ML-DS With Python
Data Science Blogathon 2024
How to Check PySpark Version
Last Updated: 10 July 2024
Knowing the version of PySpark you're working with is crucial for compatibility and troubleshooting purposes. In this article, we will walk through the steps to check the ...
read more
Python
Picked
Python-Pyspark
How to Fix "Could Not Import pypandoc - Required to Package PySpark"
Last Updated: 05 July 2024
When working with PySpark, especially during the packaging and distribution we might encounter an error related to the pypandoc library. This error can hinder the developm...
read more
Python
Picked
Python-Pyspark
Python How-to-fix
How to Create Delta Table in Databricks Using PySpark
Last Updated: 23 July 2024
An open-source storage layer called Delta Lake gives data lakes scalability, performance, and dependability. It offers a transactional layer on top of cloud storage and le...
read more
Python
Picked
Python-Pyspark
How to use Is Not in PySpark
Last Updated: 29 July 2024
Null values are undefined or empty data present in a dataframe. These null values may be added due to some errors in data transfer or technical glitches. We should identif...
read more
Python
Picked
Python-Pyspark
How to Install PySpark in Jupyter Notebook
Last Updated: 31 July 2024
PySpark is a Python library for Apache Spark, a powerful framework for big data processing and analytics. Integrating PySpark with Jupyter Notebook provides an interactive...
read more
Python
Picked
Python-Pyspark
Jupyter-notebook
Pivot String column on Pyspark Dataframe
Last Updated: 15 September 2024
Pivoting in data analysis refers to the transformation of data from a long format to a wide format by rotating rows into columns. In PySpark, pivoting is used to restructu...
read more
Python
Picked
Python-Pyspark
Python PySpark sum() Function
Last Updated: 20 September 2024
PySpark, the Python API for Apache Spark, is a powerful tool for big data processing and analytics. One of its essential functions is sum(), which is part of the pyspark.s...
read more
Python
Picked
Python-Pyspark
Python PySpark pivot() Function
Last Updated: 24 September 2024
The pivot() function in PySpark is a powerful method used to reshape a DataFrame by transforming unique values from one column into multiple columns in a new DataFrame, wh...
read more
Python
Picked
Python-Pyspark
How to Install PySpark in Kaggle
Last Updated: 10 October 2024
PySpark is the Python API for powerful distributed computing framework called Apache Spark. Among its many usage areas, I would say it majorly includes big data processing...
read more
Picked
Installation Guide
Python-Pyspark
Kaggle
1
2
3
4
...
12
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our
Cookie Policy
&
Privacy Policy
Got It !