首先创建DataFrame
import pandas as pd
import numpy as np
df_obj = pd.DataFrame(np.random.randn(5,4),columns = ['a','b','c','d'])
print(df_obj)
运行结果:
a b c d
0 -1.642584 0.647451 0.559574 0.501999
1 1.363831 2.692271 0.569345 0.698085
2 -0.171346 0.528494 -2.877623 0.225033
3 -0.284563 -0.946625 0.148989 0.627597
4 2.797140 -0.841167 1.037480 0.947025
常用的统计计算函数:
sum, mean, max, min, …
axis=0,按列统计,axis=1,按行统计
skipna排除缺失值,默认为True
# coding:utf-8
import pandas as pd
import numpy as np
df_obj = pd.DataFrame(np.random.randn(5,4),columns = ['a','b','c','d'])
print(df_obj)
print("*"*100)
print(df_obj.max())
print(df_obj.min(axis=1,skipna=False))
print(df_obj.sum())
a b c d
0 2.005623 0.761594 -0.548926 -1.201357
1 0.407529 -0.218784 0.930699 -0.823741
2 0.641325 -2.037026 -0.518321 0.597472
3 1.112061 0.133388 1.968800 -1.153320
4 -0.032120 -0.774064 -0.467220 1.095355
****************************************************************************************************
a 2.005623
b 0.761594
c 1.968800
d 1.095355
dtype: float64
0 -1.201357
1 -0.823741
2 -2.037026
3 -1.153320
4 -0.774064
dtype: float64
a 4.134417
b -2.134893
c 1.365031
d -1.485592
dtype: float64
常用的统计描述
# coding:utf-8
import pandas as pd
import numpy as np
df_obj = pd.DataFrame(np.random.randn(5,4),columns = ['a','b','c','d'])
print(df_obj)
print(df_obj.describe())
运行结果:
a b c d
0 1.394588 -0.047070 -0.327120 0.218114
1 0.159974 0.667859 0.614309 1.634314
2 -0.372147 -0.966839 0.443205 -1.086333
3 0.026549 -0.959392 0.406259 0.684068
4 -0.838770 2.605669 -1.477656 -1.420096
a b c d
count 5.000000 5.000000 5.000000 5.000000
mean 0.074039 0.260045 -0.068201 0.006013
std 0.834535 1.479430 0.866901 1.263241
min -0.838770 -0.966839 -1.477656 -1.420096
25% -0.372147 -0.959392 -0.327120 -1.086333
50% 0.026549 -0.047070 0.406259 0.218114
75% 0.159974 0.667859 0.443205 0.684068
max 1.394588 2.605669 0.614309 1.634314