How to Add Multiple Columns in PySpark Dataframes ? Last Updated : 30 Jun, 2021 Comments Improve Suggest changes Like Article Like Report In this article, we will see different ways of adding Multiple Columns in PySpark Dataframes. Let's create a sample dataframe for demonstration: Dataset Used: Cricket_data_set_odi Python3 # import pandas to read json file import pandas as pd # importing module import pyspark # importing sparksession from pyspark.sql # module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName('sparkdf').getOrCreate() # create Dataframe df=spark.read.option( "header",True).csv("Cricket_data_set_odi.csv") # Display Schema df.printSchema() # Show Dataframe df.show() Output: Method 1: Using withColumn() withColumn() is used to add a new or update an existing column on DataFrame Syntax: df.withColumn(colName, col) Returns: A new :class:`DataFrame` by adding a column or replacing the existing column that has the same name. Code: Python3 df.withColumn( 'Avg_runs', df.Runs / df.Matches).withColumn( 'wkt+10', df.Wickets+10).show() Output: Method 2: Using select() You can also add multiple columns using select. Syntax: df.select(*cols) Code: Python3 # Using select() to Add Multiple Column df.select('*', (df.Runs / df.Matches).alias('Avg_runs'), (df.Wickets+10).alias('wkt+10')).show() Output : Method 3: Adding a Constant multiple Column to DataFrame Using withColumn() and select() Let’s create a new column with constant value using lit() SQL function, on the below code. The lit() function present in Pyspark is used to add a new column in a Pyspark Dataframe by assigning a constant or literal value. Python3 from pyspark.sql.functions import col, lit df.select('*',lit("Cricket").alias("Sport")). withColumn("Fitness",lit(("Good"))).show() Output: Comment More infoAdvertise with us Next Article How to Add Multiple Columns in PySpark Dataframes ? K kg_code Follow Improve Article Tags : Python Python-Pyspark Practice Tags : python Similar Reads How to rename multiple columns in PySpark dataframe ? In this article, we are going to see how to rename multiple columns in PySpark Dataframe. Before starting let's create a dataframe using pyspark: Python3 # importing module import pyspark from pyspark.sql.functions import col # importing sparksession from pyspark.sql module from pyspark.sql import S 2 min read How to add multiple columns to a data.frame in R? In R Language adding multiple columns to a data.frame can be done in several ways. Below, we will explore different methods to accomplish this, using some practical examples. We will use the base R approach, as well as the dplyr package from the tidyverse collection of packages.Understanding Data Fr 4 min read How to Rename Multiple PySpark DataFrame Columns In this article, we will discuss how to rename the multiple columns in PySpark Dataframe. For this we will use withColumnRenamed() and toDF() functions. Creating Dataframe for demonstration: Python3 # importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql i 2 min read How to select and order multiple columns in Pyspark DataFrame ? In this article, we will discuss how to select and order multiple columns from a dataframe using pyspark in Python. For this, we are using sort() and orderBy() functions along with select() function. Methods UsedSelect(): This method is used to select the part of dataframe columns and return a copy 2 min read How to Order PysPark DataFrame by Multiple Columns ? In this article, we are going to order the multiple columns by using orderBy() functions in pyspark dataframe. Ordering the rows means arranging the rows in ascending or descending order, so we are going to create the dataframe using nested list and get the distinct data. orderBy() function that sor 2 min read Add multiple columns to dataframe in Pandas In Pandas, we have the freedom to add columns in the data frame whenever needed. There are multiple ways to add columns to pandas dataframe. Add multiple columns to a DataFrame using ListsPython3 # importing pandas library import pandas as pd # creating and initializing a nested list students = [[' 3 min read How to plot multiple data columns in a DataFrame? Python comes with a lot of useful packages such as pandas, matplotlib, numpy, etc. To use DataFrame, we need a Pandas library and to plot columns of a DataFrame, we require matplotlib. Pandas has a tight integration with Matplotlib. You can plot data directly from your DataFrame using the plot() met 3 min read How to name aggregate columns in PySpark DataFrame ? In this article, we are going to see how to name aggregate columns in the Pyspark dataframe. We can do this by using alias after groupBy(). groupBy() is used to join two columns and it is used to aggregate the columns, alias is used to change the name of the new column which is formed by grouping da 2 min read How to change dataframe column names in PySpark ? In this article, we are going to see how to change the column names in the pyspark data frame. Let's create a Dataframe for demonstration: Python3 # Importing necessary libraries from pyspark.sql import SparkSession # Create a spark session spark = SparkSession.builder.appName('pyspark - example jo 3 min read How to add a new column to a PySpark DataFrame ? In this article, we will discuss how to add a new column to PySpark Dataframe. Create the first data frame for demonstration: Here, we will be creating the sample data frame which we will be used further to demonstrate the approach purpose. Python3 # importing module import pyspark # importing spark 9 min read Like