目录
写在前面的话:
这是针对历届世界杯数据进行数据分析的第5篇文章,这是这个专题的最后一篇文章,想要看前面的文章的小伙伴们可以点击下面的链接:
[数据分析实例1]使用python-pandas对历届世界杯进行数据分析,并用matplotlib绘图https://blog.csdn.net/m0_59541412/article/details/130864289[数据分析实例2]使用python-pandas对历届世界杯进行数据分析,并用matplotlib绘图,干货满满,赶紧收藏学习起来!
https://blog.csdn.net/m0_59541412/article/details/130884091?spm=1001.2014.3001.5502
[数据分析实例3]使用python-pandas对历届世界杯进行数据分析,并用matplotlib绘图,干货满满,赶紧收藏学习起来!
https://blog.csdn.net/m0_59541412/article/details/130931933
1.分析世界杯比赛信息表
# 导入数据:
matches = pd.read_csv('d:\data\WorldCupMatches.csv')
matches
# 客场球队名称=中国队 或主场球队名称=中国队 即 中国队参加的比赛
matches[(matches['Away Team Name'] == 'China PR') | (matches['Home Team Name']== 'China PR')]
2.数据预处理
在进行数据分析前,都需要做的数据清理或特殊字段处理工作
# 统一“联邦德国”和“德国”
matches = matches.replace(['Germany FR'], 'Germany')
# 类型转化 ,尝试将选定的列解析为整数(int)
matches['Home Team Goals'] = matches['Home Team Goals'].astype(int)
matches['Away Team Goals'] = matches['Away Team Goals'].astype(int)
# 格式化比赛结果,如 3-2
matches['result'] = matches['Home Team Goals'].astype(str) + '-' + matches['Away Team Goals'].astype(str)
matches
Year Datetime Stage Stadium City Home Team Name Home Team Goals Away Team Goals Away Team Name Attendance Half-time Home Goals Half-time Away Goals Referee Assistant 1 Assistant 2 RoundID MatchID Home Team Initials Away Team Initials result
0 1930.0 13 Jul 1930 - 15:00 Group 1 Pocitos Montevideo France 4 1 Mexico 4444.0 3.0 0.0 LOMBARDI Domingo (URU) CRISTOPHE Henry (BEL) REGO Gilberto (BRA) 201.0 1096.0 FRA MEX 4-1
1 1930.0 13 Jul 1930 - 15:00 Group 4 Parque Central Montevideo USA 3 0 Belgium 18346.0 2.0 0.0 MACIAS Jose (ARG) MATEUCCI Francisco (URU) WARNKEN Alberto (CHI) 201.0 1090.0 USA BEL 3-0
2 1930.0 14 Jul 1930 - 12:45 Group 2 Parque Central Montevideo Yugoslavia 2 1 Brazil 24059.0 2.0 0.0 TEJADA Anibal (URU) VALLARINO Ricardo (URU) BALWAY Thomas (FRA) 201.0 1093.0 YUG BRA 2-1
3 1930.0 14 Jul 1930 - 14:50 Group 3 Pocitos Montevideo Romania 3 1 Peru 2549.0 1.0 0.0 WARNKEN Alberto (CHI) LANGENUS Jean (BEL) MATEUCCI Francisco (URU) 201.0 1098.0 ROU PER 3-1
4 1930.0 15 Jul 1930 - 16:00 Group 1 Parque Central Montevideo Argentina 1 0 France 23409.0 0.0 0.0 REGO Gilberto (BRA) SAUCEDO Ulises (BOL) RADULESCU Constantin (ROU) 201.0 1085.0 ARG FRA 1-0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
847 2014.0 05 Jul 2014 - 17:00 Quarter-finals Arena Fonte Nova Salvador Netherlands 0 0 Costa Rica 51179.0 0.0 0.0 Ravshan IRMATOV (UZB) RASULOV Abduxamidullo (UZB) KOCHKAROV Bakhadyr (KGZ) 255953.0 300186488.0 NED CRC 0-0
848 2014.0 08 Jul 2014 - 17:00 Semi-finals Estadio Mineirao Belo Horizonte Brazil 1 7 Germany 58141.0 0.0 5.0 RODRIGUEZ Marco (MEX) TORRENTERA Marvin (MEX) QUINTERO Marcos (MEX) 255955.0 300186474.0 BRA GER 1-7
849 2014.0 09 Jul 2014 - 17:00 Semi-finals Arena de Sao Paulo Sao Paulo Netherlands 0 0 Argentina 63267.0 0.0 0.0 Cüneyt ÇAKIR (TUR) DURAN Bahattin (TUR) ONGUN Tarik (TUR) 255955.0 300186490.0 NED ARG 0-0
850 2014.0 12 Jul 2014 - 17:00 Play-off for third place Estadio Nacional Brasilia Brazil 0 3 Netherlands 68034.0 0.0 2.0 HAIMOUDI Djamel (ALG) ACHIK Redouane (MAR) ETCHIALI Abdelhak (ALG) 255957.0 300186502.0 BRA NED 0-3
851 2014.0 13 Jul 2014 - 16:00 Final Estadio do Maracana Rio De Janeiro Germany 1 0 Argentina 74738.0 0.0 0.0 Nicola RIZZOLI (ITA) Renato FAVERANI (ITA) Andrea STEFANI (ITA) 255959.0 300186501.0 GER ARG 1-0
852 rows × 20 columns
完成数据预处理步骤之后,我们来开展数据分析工作,基于世界杯比赛信息表,我们将会分析如下问题:
3.现场观赛人数分析
在世界杯成绩汇总数据表的分析中,我们可视化分析过历届世界杯现场观众总人数,接下来我们来看看每场比赛的观看人数,筛选出热门比赛:
从上面的数据中,我们可以看出最受欢迎的五场比赛分别是哪几场,参与人数分别是多少。
不可思议的是 现场观赛人数最多的5场比赛中,前4场居然都是来自1950年巴西世界杯,足以说明巴西人对足球的狂热,再次奠定了巴西足球王国的称号。
我们进一步看到,前4场比赛都发生在巴西“马拉卡纳体育场”。
马拉卡纳体育场位于巴西里约热内卢,是巴西乃至全世界久负盛名的足球体育场,也是巴西足球的象征和标志之一。在世界足球历史上,马拉卡纳体育场是一个非常著名的场馆,不仅是8次世界杯决赛的主办地,还曾经举办过许多国际足球比赛和其他大型体育赛事。
4.比赛进球数分析
比赛最令球迷兴奋的当然是进球了,那么下面让我们来找出历史上单场比赛进球数最多的比赛:
matches['total_goals'] = matches['Home Team Goals'] + matches['Away Team Goals']
matches['VS'] = matches['Home Team Name'] + 'VS' + matches['Away Team Name']
top10_goals = matches.sort_values(by = 'total_goals', ascending = False)[:10]
top10_goals['VS'] = top10_goals['Home Team Name'] + 'VS' + top10_goals['Away Team Name']
top10_goals['total_goals_str']= top10_goals['total_goals'].astype(str)+ ' goals scored'
top10_goals['Home Team Goals'] = top10_goals['Home Team Goals'].astype(int)
top10_goals['Away Team Goals'] = top10_goals['Away Team Goals'].astype(int)
top10_goals['result'] = top10_goals['Home Team Goals'].astype(str) + '-' + top10_goals['Away Team Goals'].astype(str)
plt.figure(figsize = (12, 10))
ax = sns.barplot(y = top10_goals['VS'], x = top10_goals['total_goals'])
sns.despine(right = True)
plt.ylabel('比赛')
plt.xlabel('进球数')
plt.yticks(size = 12)
plt.xticks(size = 12 )
plt.title('进球数前10名的比赛', size = 20)
for i, s in enumerate('体育场:' + top10_goals['Stadium'] + ',日期:' + top10_goals['Datetime'] +
'\n' + top10_goals['total_goals_str'] + ',比赛结果:' + top10_goals['result']):
ax.text(2, i, s, fontsize = 14, color = 'white', va = 'center')
plt.show()
5.进球数分析
我们再来看看世界杯历史上进球最多的国家,大家可以先猜一下会不会分布在巴西队、德国队和意大利队这3个足球强国中:
# 导入数据分析包:numpy(科学计算)、pandas(处理数据框)和 matplotlib/seaborn(可视化)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# 导入数据:
matches = pd.read_csv('d:\data\WorldCupMatches.csv')
# 统一“联邦德国”和“德国”
matches = matches.replace(['Germany FR'], 'Germany')
list_countries = matches['Home Team Name'].unique().tolist()
#分主客队来统计:
lista_home = []
lista_away = []
for i in list_countries:
goals_home = matches.loc[matches['Home Team Name'] == i, 'Home Team Goals'].sum()
lista_home.append(goals_home)
goals_away = matches.loc[matches['Away Team Name'] == i, 'Away Team Goals'].sum()
lista_away.append(goals_away)
df = pd.DataFrame({'country': list_countries, 'total_home_goals': lista_home, 'total_away_goals':lista_away})
df['total_goals'] = df['total_home_goals'] + df['total_away_goals']
most_goals = df.sort_values(by = 'total_goals', ascending = False)[:10]
most_goals
country total_home_goals total_away_goals total_goals
13 Germany 168 67 235
7 Brazil 180 45 225
4 Argentina 111 22 133
15 Italy 99 29 128
0 France 68 40 108
14 Spain 50 42 92
34 Netherlands 51 40 91
10 Hungary 73 14 87
6 Uruguay 62 18 80
18 England 54 25 79
和我们猜想的差不多,历史进球最多的队伍分别是德国队、巴西队、阿根廷队和意大利队;主场进球最队的国家队分别是巴西队、德国队、阿根廷队和意大利队;客场进球排名是德国、巴西、西班牙和法国队。
到这里,关于历届世界杯数据分析的文章全部写完了,希望对各位对数据分析感兴趣的小伙伴们有帮助。