How to change dataframe column names in PySpark ? Last Updated : 15 Feb, 2022 Comments Improve Suggest changes Like Article Like Report In this article, we are going to see how to change the column names in the pyspark data frame. Let's create a Dataframe for demonstration: Python3 # Importing necessary libraries from pyspark.sql import SparkSession # Create a spark session spark = SparkSession.builder.appName('pyspark - example join').getOrCreate() # Create data in dataframe data = [(('Ram'), '1991-04-01', 'M', 3000), (('Mike'), '2000-05-19', 'M', 4000), (('Rohini'), '1978-09-05', 'M', 4000), (('Maria'), '1967-12-01', 'F', 4000), (('Jenis'), '1980-02-17', 'F', 1200)] # Column names in dataframe columns = ["Name", "DOB", "Gender", "salary"] # Create the spark dataframe df = spark.createDataFrame(data=data, schema=columns) # Print the dataframe df.show() Output : Method 1: Using withColumnRenamed() We will use of withColumnRenamed() method to change the column names of pyspark data frame. Syntax: DataFrame.withColumnRenamed(existing, new) Parameters existingstr: Existing column name of data frame to rename.newstr: New column name.Returns type: Returns a data frame by renaming an existing column. Example 1: Renaming the single column in the data frame Here we're Renaming the column name 'DOB' to 'DateOfBirth'. Python3 # Rename the column name from DOB to DateOfBirth # Print the dataframe df.withColumnRenamed("DOB","DateOfBirth").show() Output : Example 2: Renaming multiple column names Python3 # Rename the column name 'Gender' to 'Sex' # Then for the returning dataframe # again rename the 'salary' to 'Amount' df.withColumnRenamed("Gender","Sex"). withColumnRenamed("salary","Amount").show() Output : Method 2: Using selectExpr() Renaming the column names using selectExpr() method Syntax : DataFrame.selectExpr(expr) Parameters : expr : It's an SQL expression. Here we are renaming Name as a name. Python3 # Select the 'Name' as 'name' # Select remaining with their original name data = df.selectExpr("Name as name","DOB","Gender","salary") # Print the dataframe data.show() Output : Method 3: Using select() method Syntax: DataFrame.select(cols) Parameters : cols: List of column names as strings. Return type: Selects the cols in the dataframe and returns a new DataFrame. Here we Rename the column name 'salary' to 'Amount' Python3 # Import col method from pyspark.sql.functions from pyspark.sql.functions import col # Select the 'salary' as 'Amount' using aliasing # Select remaining with their original name data = df.select(col("Name"),col("DOB"), col("Gender"), col("salary").alias('Amount')) # Print the dataframe data.show() Output : Method 4: Using toDF() This function returns a new DataFrame that with new specified column names. Syntax: toDF(*col) Where, col is a new column name In this example, we will create an order list of new column names and pass it into toDF function Python3 Data_list = ["Emp Name","Date of Birth", " Gender-m/f","Paid salary"] new_df = df.toDF(*Data_list) new_df.show() Output: Comment More infoAdvertise with us Next Article How to change dataframe column names in PySpark ? M ManikantaBandla Follow Improve Article Tags : Python Python-Pyspark Practice Tags : python Similar Reads How to name aggregate columns in PySpark DataFrame ? In this article, we are going to see how to name aggregate columns in the Pyspark dataframe. We can do this by using alias after groupBy(). groupBy() is used to join two columns and it is used to aggregate the columns, alias is used to change the name of the new column which is formed by grouping da 2 min read How to Change Column Type in PySpark Dataframe ? In this article, we are going to see how to change the column type of pyspark dataframe. Creating dataframe for demonstration: Python # Create a spark session from pyspark.sql import SparkSession spark = SparkSession.builder.appName('SparkExamples').getOrCreate() # Create a spark dataframe columns = 4 min read How to get name of dataframe column in PySpark ? In this article, we will discuss how to get the name of the Dataframe column in PySpark. To get the name of the columns present in the Dataframe we are using the columns function through this function we will get the list of all the column names present in the Dataframe. Syntax: df.columns We can a 3 min read How to Add Multiple Columns in PySpark Dataframes ? In this article, we will see different ways of adding Multiple Columns in PySpark Dataframes. Let's create a sample dataframe for demonstration: Dataset Used: Cricket_data_set_odi Python3 # import pandas to read json file import pandas as pd # importing module import pyspark # importing sparksessio 2 min read How to add column sum as new column in PySpark dataframe ? In this article, we are going to see how to perform the addition of New columns in Pyspark dataframe by various methods. It means that we want to create a new column that will contain the sum of all values present in the given row. Now let's discuss the various methods how we add sum as new columns 4 min read How to add a new column to a PySpark DataFrame ? In this article, we will discuss how to add a new column to PySpark Dataframe. Create the first data frame for demonstration: Here, we will be creating the sample data frame which we will be used further to demonstrate the approach purpose. Python3 # importing module import pyspark # importing spark 9 min read How to Iterate over rows and columns in PySpark dataframe In this article, we will discuss how to iterate rows and columns in PySpark dataframe. Create the dataframe for demonstration: Python3 # importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app nam 6 min read How to rename multiple columns in PySpark dataframe ? In this article, we are going to see how to rename multiple columns in PySpark Dataframe. Before starting let's create a dataframe using pyspark: Python3 # importing module import pyspark from pyspark.sql.functions import col # importing sparksession from pyspark.sql module from pyspark.sql import S 2 min read How to delete columns in PySpark dataframe ? In this article, we are going to delete columns in Pyspark dataframe. To do this we will be using the drop() function. This function can be used to remove values from the dataframe. Syntax: dataframe.drop('column name') Python code to create student dataframe with three columns: Python3 # importing 2 min read Change column names and row indexes in Pandas DataFrame Given a Pandas DataFrame, let's see how to change its column names and row indexes. About Pandas DataFramePandas DataFrame are rectangular grids which are used to store data. It is easy to visualize and work with data when stored in dataFrame. It consists of rows and columns.Each row is a measuremen 4 min read Like