pandas: How to use astype() to cast dtype of DataFrame

Modified: 2023-08-09 | Tags: Python, pandas

pandas.Series has a single data type (dtype), while pandas.DataFrame can have a different data type for each column.

You can specify dtype in various contexts, such as when creating a new object using a constructor or when reading from a CSV file. Additionally, you can cast an existing object to a different dtype using the astype() method.

Contents

List of basic data types (dtype) in pandas
object type and string
Cast data type (dtype) with astype()
Specify data type (dtype) when reading CSV files with read_csv()
- Specify the same data type (dtype) for all columns
- Specify data type (dtype) for each column
Implicit type conversions

See the following article on how to extract columns by dtype.

pandas: Extract columns from DataFrame based on dtype

See the following article about dtype and astype() in NumPy.

NumPy: Cast ndarray to a specific dtype with astype()

Please note that the sample code used in this article is based on pandas version 2.0.3 and behavior may vary with different versions.

import pandas as pd
import numpy as np

print(pd.__version__)
# 2.0.3

source: pandas_dtype.py

List of basic data types (`dtype`) in pandas

The following is a list of basic data types (dtype) in pandas.

`dtype`	character code	description
`int8`	`i1`	8-bit signed integer
`int16`	`i2`	16-bit signed integer
`int32`	`i4`	32-bit signed integer
`int64`	`i8`	64-bit signed integer
`uint8`	`u1`	8-bit unsigned integer
`uint16`	`u2`	16-bit unsigned integer
`uint32`	`u4`	32-bit unsigned integer
`uint64`	`u8`	64-bit unsigned integer
`float16`	`f2`	16-bit floating-point number
`float32`	`f4`	32-bit floating-point number
`float64`	`f8`	64-bit floating-point number
`float128`	`f16`	128-bit floating-point number
`complex64`	`c8`	64-bit complex floating-point number
`complex128`	`c16`	128-bit complex floating-point number
`complex256`	`c32`	256-bit complex floating-point number
`bool`	`?`	Boolean (`True` or `False`)
`unicode`	`U`	Unicode string
`object`	`O`	Python objects

Note that the numbers in dtype represent bits, whereas those in character codes represent bytes. The character code for the bool type is ?. It does not mean unknown; rather, ? is literally assigned.

You can specify dtype in various ways. For example, any of the following representations can be used for float64:

np.float64
'float64'
'f8'

s = pd.Series([0, 1, 2], dtype=np.float64)
print(s.dtype)
# float64

s = pd.Series([0, 1, 2], dtype='float64')
print(s.dtype)
# float64

s = pd.Series([0, 1, 2], dtype='f8')
print(s.dtype)
# float64

source: pandas_dtype.py

You can also specify data types using Python types like int, float, or str, without specifying bit-precision.

In such cases, the type is converted to the equivalent dtype. Examples in Python3, 64-bit environment are as follows. Although uint is not a native Python type, it's included in the table for convenience.

Python type	Example of equivalent `dtype`
`int`	`int64`
`float`	`float64`
`str`	`object` (Each element is `str`)
(`uint`)	`uint64`

You can use types like int, float, or the strings 'int' and 'float'. However, you cannot use uint because it is not a native Python type.

s = pd.Series([0, 1, 2], dtype='float')
print(s.dtype)
# float64

s = pd.Series([0, 1, 2], dtype=float)
print(s.dtype)
# float64

s = pd.Series([0, 1, 2], dtype='uint')
print(s.dtype)
# uint64

source: pandas_dtype.py

You can check the range of possible values (minimum and maximum values) for integer and floating-point numbers types with np.iinfo() and np.finfo().

NumPy: Cast ndarray to a specific dtype with astype()

The data types discussed here are primarily based on NumPy, but pandas has extended some of its own data types.

Essential basic functionality - dtypes — pandas 2.0.3 documentation

`object` type and string

This section explains the object type and the string (str).

Note that StringDtype was introduced in pandas version 1.0.0 as a data type for strings. This type might become the standard in the future, but it is not mentioned here. See the official documentation for details.

Working with text data — pandas 2.0.3 documentation

The special data type: `object`

The object type is a special data type that can store references to any Python objects. Each element may be of a different type.

The data type for Series and DataFrame columns containing strings is object. However, each element can have its own distinct type, meaning not all elements need to be strings.

Here are some examples. The built-in function type() is applied to each element using the map() method to check its type. np.nan represents a missing value.

s_object = pd.Series([0, 'abcde', np.nan])
print(s_object)
# 0        0
# 1    abcde
# 2      NaN
# dtype: object

print(s_object.map(type))
# 0      <class 'int'>
# 1      <class 'str'>
# 2    <class 'float'>
# dtype: object

source: pandas_dtype.py

If str is specified in the astype() method (see below for details), all elements, including NaN, are converted to str. The dtype remains as object.

s_str_astype = s_object.astype(str)
print(s_str_astype)
# 0        0
# 1    abcde
# 2      nan
# dtype: object

print(s_str_astype.map(type))
# 0    <class 'str'>
# 1    <class 'str'>
# 2    <class 'str'>
# dtype: object

source: pandas_dtype.py

If str is specified in the dtype argument of the constructor, NaN remains float. Note that, in version 0.22.0, NaN was converted to str.

s_str_constructor = pd.Series([0, 'abcde', np.nan], dtype=str)
print(s_str_constructor)
# 0        0
# 1    abcde
# 2      NaN
# dtype: object

print(s_str_constructor.map(type))
# 0      <class 'str'>
# 1      <class 'str'>
# 2    <class 'float'>
# dtype: object

source: pandas_dtype.py

Note: String methods

Note that even when the dtype is object, the result of string methods (accessed via the str accessor) can differ based on the type of each element.

For example, applying str.len(), which returns the number of characters, an element of numeric type returns NaN.

s_object = pd.Series([0, 'abcde', np.nan])
print(s_object)
# 0        0
# 1    abcde
# 2      NaN
# dtype: object

print(s_object.str.len())
# 0    NaN
# 1    5.0
# 2    NaN
# dtype: float64

source: pandas_dtype.py

If the result of the string method includes NaN, it indicates that not all elements might be of type str, even if the column's data type is object. In such cases, you can apply astype(str) before using the string method.

s_str_astype = s_object.astype(str)
print(s_str_astype)
# 0        0
# 1    abcde
# 2      nan
# dtype: object

print(s_str_astype.str.len())
# 0    1
# 1    5
# 2    3
# dtype: int64

source: pandas_dtype.py

See also the following articles for string methods.

Note: `NaN`

You can determine the missing value NaN with isnull() or remove it with dropna().

s_object = pd.Series([0, 'abcde', np.nan])
print(s_object)
# 0        0
# 1    abcde
# 2      NaN
# dtype: object

print(s_object.map(type))
# 0      <class 'int'>
# 1      <class 'str'>
# 2    <class 'float'>
# dtype: object

print(s_object.isnull())
# 0    False
# 1    False
# 2     True
# dtype: bool

print(s_object.dropna())
# 0        0
# 1    abcde
# dtype: object

source: pandas_dtype.py

Note that if cast to the string (str), NaN becomes the string 'nan' and is not treated as a missing value.

s_str_astype = s_object.astype(str)
print(s_str_astype)
# 0        0
# 1    abcde
# 2      nan
# dtype: object

print(s_str_astype.map(type))
# 0    <class 'str'>
# 1    <class 'str'>
# 2    <class 'str'>
# dtype: object

print(s_str_astype.isnull())
# 0    False
# 1    False
# 2    False
# dtype: bool

print(s_str_astype.dropna())
# 0        0
# 1    abcde
# 2      nan
# dtype: object

source: pandas_dtype.py

You can treat it as a missing value before casting, or replace the string 'nan' with NaN using replace().

pandas: Replace values in DataFrame and Series with replace()

s_str_astype_nan = s_str_astype.replace('nan', np.nan)
print(s_str_astype_nan)
# 0        0
# 1    abcde
# 2      NaN
# dtype: object

print(s_str_astype_nan.map(type))
# 0      <class 'str'>
# 1      <class 'str'>
# 2    <class 'float'>
# dtype: object

print(s_str_astype_nan.isnull())
# 0    False
# 1    False
# 2     True
# dtype: bool

source: pandas_dtype.py

Cast data type (`dtype`) with `astype()`

You can cast the data type (dtype) with the method astype() of DataFrame and Series.

astype() returns a new Series or DataFrame with the specified dtype. The original object is not changed.

Cast data type of `pandas.Series`

You can specify the data type (dtype) to astype().

s = pd.Series([1, 2, 3])
print(s)
# 0    1
# 1    2
# 2    3
# dtype: int64

s_f = s.astype('float64')
print(s_f)
# 0    1.0
# 1    2.0
# 2    3.0
# dtype: float64

source: pandas_astype.py

As mentioned above, you can specify dtype in various forms.

s_f = s.astype('float')
print(s_f.dtype)
# float64

s_f = s.astype(float)
print(s_f.dtype)
# float64

s_f = s.astype('f8')
print(s_f.dtype)
# float64

source: pandas_astype.py

Cast data type of all columns of `pandas.DataFrame`

DataFrame has the data type (dtype) for each column. You can check each dtype with the dtypes attribute.

df = pd.DataFrame({'a': [11, 21, 31], 'b': [12, 22, 32], 'c': [13, 23, 33]})
print(df)
#     a   b   c
# 0  11  12  13
# 1  21  22  23
# 2  31  32  33

print(df.dtypes)
# a    int64
# b    int64
# c    int64
# dtype: object

source: pandas_astype.py

If you specify the data type (dtype) to astype(), the data types of all columns are changed.

df_f = df.astype('float64')
print(df_f)
#       a     b     c
# 0  11.0  12.0  13.0
# 1  21.0  22.0  23.0
# 2  31.0  32.0  33.0

print(df_f.dtypes)
# a    float64
# b    float64
# c    float64
# dtype: object

source: pandas_astype.py

Cast data type of any column of `pandas.DataFrame` individually

You can change the data type (dtype) of any column individually by specifying a dictionary of {column name: data type} to astype().

df = pd.DataFrame({'a': [11, 21, 31], 'b': [12, 22, 32], 'c': [13, 23, 33]})
print(df)
#     a   b   c
# 0  11  12  13
# 1  21  22  23
# 2  31  32  33

print(df.dtypes)
# a    int64
# b    int64
# c    int64
# dtype: object

df_fcol = df.astype({'a': float})
print(df_fcol)
#       a   b   c
# 0  11.0  12  13
# 1  21.0  22  23
# 2  31.0  32  33

print(df_fcol.dtypes)
# a    float64
# b      int64
# c      int64
# dtype: object

df_fcol2 = df.astype({'a': 'float32', 'c': 'int8'})
print(df_fcol2)
#       a   b   c
# 0  11.0  12  13
# 1  21.0  22  23
# 2  31.0  32  33

print(df_fcol2.dtypes)
# a    float32
# b      int64
# c       int8
# dtype: object

source: pandas_astype.py

Specify data type (`dtype`) when reading CSV files with `read_csv()`

In pandas, pd.read_csv() is used to read CSV files, and you can set data types using the dtype argument.

pandas: Read CSV into DataFrame with read_csv()

Use the following CSV file as an example.

,a,b,c,d
ONE,1,"001",100,x
TWO,2,"020",,y
THREE,3,"300",300,z

source: sample_header_index_dtype.csv

If the dtype argument is omitted, a data type is automatically chosen for each column.

df = pd.read_csv('data/src/sample_header_index_dtype.csv', index_col=0)
print(df)
#        a    b      c  d
# ONE    1    1  100.0  x
# TWO    2   20    NaN  y
# THREE  3  300  300.0  z

print(df.dtypes)
# a      int64
# b      int64
# c    float64
# d     object
# dtype: object

source: pandas_read_csv_dtype.py

Specify the same data type (`dtype`) for all columns

If you specify a data type for the dtype argument, all columns are converted to that type. If there are columns that cannot be converted to the specified data type, an error will be raised.

# pd.read_csv('data/src/sample_header_index_dtype.csv',
#             index_col=0, dtype=float)
# ValueError: could not convert string to float: 'ONE'

source: pandas_read_csv_dtype.py

If you set dtype=str, all columns are converted to strings. However, in this case, the missing value (NaN) will still be of type float.

df_str = pd.read_csv('data/src/sample_header_index_dtype.csv',
                     index_col=0, dtype=str)
print(df_str)
#        a    b    c  d
# ONE    1  001  100  x
# TWO    2  020  NaN  y
# THREE  3  300  300  z

print(df_str.dtypes)
# a    object
# b    object
# c    object
# d    object
# dtype: object

print(df_str.applymap(type))
#                    a              b                c              d
# ONE    <class 'str'>  <class 'str'>    <class 'str'>  <class 'str'>
# TWO    <class 'str'>  <class 'str'>  <class 'float'>  <class 'str'>
# THREE  <class 'str'>  <class 'str'>    <class 'str'>  <class 'str'>

source: pandas_read_csv_dtype.py

If you read the file without specifying dtype and then cast it to str with astype(), NaN values are also converted to the string 'nan'.

df = pd.read_csv('data/src/sample_header_index_dtype.csv', index_col=0)
print(df.astype(str))
#        a    b      c  d
# ONE    1    1  100.0  x
# TWO    2   20    nan  y
# THREE  3  300  300.0  z

print(df.astype(str).applymap(type))
#                    a              b              c              d
# ONE    <class 'str'>  <class 'str'>  <class 'str'>  <class 'str'>
# TWO    <class 'str'>  <class 'str'>  <class 'str'>  <class 'str'>
# THREE  <class 'str'>  <class 'str'>  <class 'str'>  <class 'str'>

source: pandas_read_csv_dtype.py

Specify data type (`dtype`) for each column

As with astype(), you can use a dictionary to specify the data type for each column in read_csv().

df_col = pd.read_csv('data/src/sample_header_index_dtype.csv',
                     index_col=0, dtype={'a': float, 'b': str})
print(df_col)
#          a    b      c  d
# ONE    1.0  001  100.0  x
# TWO    2.0  020    NaN  y
# THREE  3.0  300  300.0  z

print(df_col.dtypes)
# a    float64
# b     object
# c    float64
# d     object
# dtype: object

source: pandas_read_csv_dtype.py

The dictionary keys can also be column numbers. Be careful, if you are specifying the index column, you need to specify the column numbers including the index column.

df_col = pd.read_csv('data/src/sample_header_index_dtype.csv',
                     index_col=0, dtype={1: float, 2: str})
print(df_col)
#          a    b      c  d
# ONE    1.0  001  100.0  x
# TWO    2.0  020    NaN  y
# THREE  3.0  300  300.0  z

print(df_col.dtypes)
# a    float64
# b     object
# c    float64
# d     object
# dtype: object

source: pandas_read_csv_dtype.py

Implicit type conversions

In addition to explicit type conversions using astype(), data types may also be converted implicitly during certain operations.

Consider a DataFrame with columns of integer (int) and columns of floating point (float) as an example.

df_mix = pd.DataFrame({'col_int': [0, 1, 2], 'col_float': [0.0, 0.1, 0.2]}, index=['A', 'B', 'C'])
print(df_mix)
#    col_int  col_float
# A        0        0.0
# B        1        0.1
# C        2        0.2

print(df_mix.dtypes)
# col_int        int64
# col_float    float64
# dtype: object

source: pandas_implicit_type_conversion.py

Implicit type conversion by arithmetic operations

For example, the result of addition by the + operator of an int column to a float column is a float.

print(df_mix['col_int'] + df_mix['col_float'])
# A    0.0
# B    1.1
# C    2.2
# dtype: float64

source: pandas_implicit_type_conversion.py

Similarly, operations with scalar values implicitly convert the data type. The result of division by the / operator is float.

print(df_mix / 1)
#    col_int  col_float
# A      0.0        0.0
# B      1.0        0.1
# C      2.0        0.2

print((df_mix / 1).dtypes)
# col_int      float64
# col_float    float64
# dtype: object

source: pandas_implicit_type_conversion.py

For arithmetic operations like +, -, *, //, and **, operations involving only integers return int, while those involving at least one floating-point number return float. This is equivalent to the implicit type conversion of the NumPy array ndarray.

NumPy: Cast ndarray to a specific dtype with astype()

print(df_mix * 1)
#    col_int  col_float
# A        0        0.0
# B        1        0.1
# C        2        0.2

print((df_mix * 1).dtypes)
# col_int        int64
# col_float    float64
# dtype: object

print(df_mix * 1.0)
#    col_int  col_float
# A      0.0        0.0
# B      1.0        0.1
# C      2.0        0.2

print((df_mix * 1.0).dtypes)
# col_int      float64
# col_float    float64
# dtype: object

source: pandas_implicit_type_conversion.py

Implicit type conversion by transposition, etc.

The data type may change when you select a row as a Series using loc or iloc, or when you transpose a DataFrame with T or transpose().

print(df_mix.loc['A'])
# col_int      0.0
# col_float    0.0
# Name: A, dtype: float64

print(df_mix.T)
#              A    B    C
# col_int    0.0  1.0  2.0
# col_float  0.0  0.1  0.2

print(df_mix.T.dtypes)
# A    float64
# B    float64
# C    float64
# dtype: object

source: pandas_implicit_type_conversion.py

Implicit type conversion by assignment to elements

The data type may also be implicitly converted when assigning a value to an element.

For example, assigning a float value to an element in the int column converts that column to float, while assigning an int value to an element in the float column retains the float type for that element.

df_mix.at['A', 'col_int'] = 10.1
df_mix.at['A', 'col_float'] = 10
print(df_mix)
#    col_int  col_float
# A     10.1       10.0
# B      1.0        0.1
# C      2.0        0.2

print(df_mix.dtypes)
# col_int      float64
# col_float    float64
# dtype: object

source: pandas_implicit_type_conversion.py

When a string value is assigned to an element in the numeric column, the data type of the column is cast to object.

df_mix.at['A', 'col_float'] = 'abc'
print(df_mix)
#    col_int col_float
# A     10.1       abc
# B      1.0       0.1
# C      2.0       0.2

print(df_mix.dtypes)
# col_int      float64
# col_float     object
# dtype: object

print(df_mix.applymap(type))
#            col_int        col_float
# A  <class 'float'>    <class 'str'>
# B  <class 'float'>  <class 'float'>
# C  <class 'float'>  <class 'float'>

source: pandas_implicit_type_conversion.py

The sample code above is based on version 2.0.3. In version 0.22.0, the column type remained unchanged after assigning an element of a different type, though the type of the assigned element itself changed. Note that the behavior might differ depending on the version.

pandas: How to use astype() to cast dtype of DataFrame

List of basic data types (`dtype`) in pandas

`object` type and string

The special data type: `object`

Note: String methods

Note: `NaN`

Cast data type (`dtype`) with `astype()`

Cast data type of `pandas.Series`

Cast data type of all columns of `pandas.DataFrame`

Cast data type of any column of `pandas.DataFrame` individually

Specify data type (`dtype`) when reading CSV files with `read_csv()`

Specify the same data type (`dtype`) for all columns

Specify data type (`dtype`) for each column

Implicit type conversions

Implicit type conversion by arithmetic operations

Implicit type conversion by transposition, etc.

Implicit type conversion by assignment to elements

Related Categories

Related Articles

pandas: How to use astype() to cast dtype of DataFrame

List of basic data types (dtype) in pandas

object type and string

The special data type: object

Note: String methods

Note: NaN

Cast data type (dtype) with astype()

Cast data type of pandas.Series

Cast data type of all columns of pandas.DataFrame

Cast data type of any column of pandas.DataFrame individually

Specify data type (dtype) when reading CSV files with read_csv()

Specify the same data type (dtype) for all columns

Specify data type (dtype) for each column

Implicit type conversions

Implicit type conversion by arithmetic operations

Implicit type conversion by transposition, etc.

Implicit type conversion by assignment to elements

Related Categories

Related Articles

List of basic data types (`dtype`) in pandas

`object` type and string

The special data type: `object`

Note: `NaN`

Cast data type (`dtype`) with `astype()`

Cast data type of `pandas.Series`

Cast data type of all columns of `pandas.DataFrame`

Cast data type of any column of `pandas.DataFrame` individually

Specify data type (`dtype`) when reading CSV files with `read_csv()`

Specify the same data type (`dtype`) for all columns

Specify data type (`dtype`) for each column