0% found this document useful (0 votes)

69 views20 pages

DS Final Project PDF

Uploaded by

Maikura

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views20 pages

DS Final Project PDF

Uploaded by

Maikura

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Data Gathering

This is where we upload the data to the notebook. In our case, we will deal with the
Spotify's Top 10,000 Streamed Songs.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

from google.colab import files

uploaded = files.upload()

<IPython.core.display.HTML object>

Saving Spotify_final_dataset.csv to Spotify_final_dataset.csv

df = pd.read_csv("Spotify_final_dataset.csv", sep=",")
df.head()

Position Artist Name Song Name

Days \
0 1 Post Malone Sunflower SpiderMan: Into the SpiderVerse
1506
1 2 Juice WRLD Lucid Dreams
1673
2 3 Lil Uzi Vert XO TOUR Llif3
1853
3 4 J. Cole No Role Modelz
2547
4 5 Post Malone rockstar
1223

Top 10 (xTimes) Peak Position Peak Position (xTimes) Peak Streams

\
0 302.0 1 (x29) 2118242

1 178.0 1 (x20) 2127668

2 212.0 1 (x4) 1660502

3 6.0 7 0 659366

4 186.0 1 (x124) 2905678

Total Streams
0 883369738
1 864832399
2 781153024
3 734857487
4 718865961

Cleansing Data
This is where we will check if all of them have proper datatypes. We will also filter out
some of the irrelevant information that will not be used later.
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11084 entries, 0 to 11083
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Position 11084 non-null int64
1 Artist Name 11084 non-null object
2 Song Name 11080 non-null object
3 Days 11084 non-null int64
4 Top 10 (xTimes) 11084 non-null float64
5 Peak Position 11084 non-null int64
6 Peak Position (xTimes) 11084 non-null object
7 Peak Streams 11084 non-null int64
8 Total Streams 11084 non-null int64
dtypes: float64(1), int64(5), object(3)
memory usage: 779.5+ KB

Dropping the irrelevant columns

df.drop(['Top 10 (xTimes)','Peak Position','Peak Position
(xTimes)','Peak Streams'],axis=1,inplace=True)
df

Position Artist Name \

0 1 Post Malone
1 2 Juice WRLD
2 3 Lil Uzi Vert
3 4 J. Cole
4 5 Post Malone
... ... ...
11079 11080 The Band Perry
11080 11081 Justin Timberlake
11081 11082 Mike WiLL Made
11082 11083 The Vamps
11083 11084 JAY

Song Name Days Total Streams

0 Sunflower SpiderMan: Into the SpiderVerse 1506 883369738

1 Lucid Dreams 1673 864832399

2 XO TOUR Llif3 1853 781153024

3 No Role Modelz 2547 734857487

4 rockstar 1223 718865961

... ... ... ...

11079 If I Die Young 1 51321

11080 Not a Bad Thing 1 49512

11081 It 23 1 46547

11082 Somebody To You 1 44962

11083 Z Holy Grail 1 44323

[11084 rows x 5 columns]

Checking for null values

null_rows = df[df.isnull().any(axis=1)]
null_rows

Position Artist Name Song Name Days Total Streams

5506 5507 Jenny Duncan NaN 1 1737605
6217 6218 Dj Ozuna NaN 6 1198268
7177 7178 Daniel Marcy NaN 1 710534
8215 8216 Amy Kaylee NaN 2 412133

Checking if artists have more than 1 hit song. If not, we can replace the name of song with
artist name
for i in null_rows['Artist Name']:
count = (df["Artist Name"] == i).sum()
if count > 1:
print(f"{i} has occured {count} times.")
else:
print(f"{i} has occurred only once.")

Jenny Duncan has occurred only once.

Dj Ozuna has occurred only once.
Daniel Marcy has occurred only once.
Amy Kaylee has occurred only once.
df["Song Name"] = df["Song Name"].fillna(df["Artist Name"])

Double check if there are still null rows

null_rows = df[df.isnull().any(axis=1)]
null_rows

Empty DataFrame
Columns: [Position, Artist Name, Song Name, Days, Total Streams]
Index: []

Exploratory Data Analysis

In this phase, we will now sort the data according to the highest streams per song and per
artists. We will also create a graph where we will see the relationship between the Position
of the Song, the number of days since it was released, and its number of Streams.
Displaying the Top 25 Artists with the most Hit Songs
plt.figure(figsize=(20,20))
artist_counts = df.groupby('Artist
Name').size().sort_values(ascending=False)
top_artists = artist_counts.nlargest(25)
sns.barplot(y=top_artists.index, x=top_artists.values)
plt.title("Top 25 Artists with most Hit songs", size=20)
plt.ylabel("Artist", size=20)
plt.xlabel("Number of songs", size=20)
plt.xticks(size=20)
plt.yticks(size=20)
plt.show()
Drake, Future and Taylor Swift are the Top 3 artists that have highest hit songs
Visualizing the distribution of streams using a histogram
plt.figure(figsize=(20,10))
sns.histplot(data=df, x='Total Streams', bins=15)
plt.title("Streams Distribution", size=20)
plt.xlabel("Total Streams", size=20)
plt.ylabel("Frequency", size=20)
plt.show()
We can see here that the histogram displayed limited amount of data. This is most likely
because most of the songs are streamed less than 50 million times.
We will redo the process and limit the range of the data to be displayed.
plt.figure(figsize=(20,10))
sns.histplot(data=df, x='Total Streams', bins=20)
plt.title("Streams Distribution", size=20)
plt.xlabel("Total Streams in hundred millions", size=20)
plt.ylim(0,100)
plt.ylabel("Frequency", size=20)
plt.show()
As we can see, we limited the data to a hundred million streams. We can see that the
streams became lower
Display the Top 25 songs that are streamed the most
plt.figure(figsize=(20,40))
top_songs = df.sort_values(by='Total Streams',
ascending=False).head(25)
top_songs = top_songs[['Song Name', 'Total Streams']]
sns.barplot(y='Song Name', x='Total Streams', data=top_songs)
plt.title("Top 25 songs that are streamed the most", size=20)
plt.ylabel("Song Name", size=20)
plt.xlabel('Total Streams', size=20)
plt.xticks(size=20)
plt.yticks(size=20)
plt.show()
Sunflower, Lucid Dreams, and XO Tour Llif3 are the top 3 songs with the highest streams
Display the Artist with highest total streams from their songs
df['Artist Name'] = df['Artist Name'].str.replace('[^a-zA-Z0-9 \n\.]',
'')
plt.figure(figsize=(20,40))
artist_streams = df.groupby('Artist Name')['Total
Streams'].sum().sort_values(ascending=False)
top_artists = artist_streams.nlargest(50)
sns.barplot(y=top_artists.index, x=top_artists.values)
plt.title("Artist with highest total streams", size=20)
plt.ylabel("Artist", size=20)
plt.xlabel("Total Streams in Billions", size=20)
plt.xticks(size=20)
plt.yticks(size=20)
plt.show()
Drake, Post Malone, and Juice WRLD are the top 3 artist with the highest total streams
Heatmap Correlation:
plt.figure(figsize=(20,10))
sns.heatmap(df.corr(),annot=True)
plt.xticks(size=20, rotation=90)
plt.yticks(size=20)
plt.show()

sns.pairplot(df, x_vars=['Days'], y_vars=['Total Streams'], height=10)

plt.xlabel('Days', size=20)
plt.ylabel('Total Streams', size=20)
plt.show()
Days and Total Streams have great positive correlation as shown in the plot. The higher the
days, the higher the streams.
sns.pairplot(df, x_vars=['Position'], y_vars=['Total Streams'],
height=10)
plt.xlabel('Position', size=20)
plt.ylabel('Total Streams', size=20)
plt.show()
As we can see from the graph, Position and Total Streams have negative correlation. This is
because the lower the position, the higher the total streams will be.
Observation from EDA Graphs:
1. Days has high positive correlation with Total streams. According to this data, the
more days a song has, the more streams it will get.
2. Position has negative correlation with the Total Streams. And since Total Streams
and Days have a high positive correlation, we can say that with more days, a song
may have better position.

Modelling
This is where we model our data. We will use three different models for this project. These
are Linear Regression, Multi-Layer Perceptron, and Random Forest models.
This is where importing happens.
from sklearn.linear_model import LinearRegression
from sklearn.neural_network import MLPRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

These are declarations of variables that will be used to perform our model.
df_LR = LinearRegression()
df_MLPR = MLPRegressor()
df_RFP = RandomForestRegressor()

df.head()

Position Artist Name Song Name

Total Streams
0 883369738
1 864832399
2 781153024
3 734857487
4 718865961

x = df[['Total Streams']]
y = df[['Days']]

Total Streams
0 883369738
1 864832399
2 781153024
3 734857487
4 718865961
... ...
11079 51321
11080 49512
11081 46547
11082 44962
11083 44323
[11084 rows x 1 columns]

Days
0 1506
1 1673
2 1853
3 2547
4 1223
... ...
11079 1
11080 1
11081 1
11082 1
11083 1

[11084 rows x 1 columns]

Train and test split method will be used to assess our data. They will be separated to train
and test. This will determine the accuracy of the graph above.
Xtrain, Xtest, Ytrain, Ytest = train_test_split(x,y, test_size=0.2,
train_size=None, random_state=None, shuffle=True, stratify=None)

Xtrain

Total Streams
6416 1084804
3045 8815796
2314 15412318
280 163082925
7877 520840
... ...
8686 317994
4446 3117596
8395 360406
5557 1691747
6866 825130

[8867 rows x 1 columns]

Ytrain

Days
6416 2
3045 34
2314 34
280 414
7877 7
... ...
8686 1
4446 7
8395 4
5557 5
6866 2

[8867 rows x 1 columns]

Xtest

Total Streams
7466 619903
1 864832399
1712 25512899
3259 7466960
1568 29258354
... ...
8792 306573
6679 933608
1838 23088066
4133 3909187
9499 259997

[2217 rows x 1 columns]

Ytest

Days
7466 2
1 1673
1712 56
3259 37
1568 106
... ...
8792 1
6679 4
1838 59
4133 29
9499 1

[2217 rows x 1 columns]

df_LR.fit(Xtrain, Ytrain)
df_MLPR.fit(Xtrain, Ytrain)
df_RFP.fit(Xtrain, Ytrain)

RandomForestRegressor()

#Evaluation Now that we have our data modeled, we will now display these models in
graphs. For the R-squared score comparison, Linear Regression got 85.92%, Multi-Layer
Perceptor got 75.36%, and Random Forest got 97.57%. For the Mean Squared Error
Comparison, Linear Regression got 41.78%, Multi-Laye Perceptor got 44.57%, and Random
Forest got 49.10%. For the Mean Absolute Comparison, Linear Regression got 20.22%,
Multi-Layer Perceptor got 16.60%, and Random Forest got 18.54%.
print(df_LR.score(Xtrain, Ytrain))
print(df_MLPR.score(Xtrain, Ytrain))
print(df_RFP.score(Xtrain, Ytrain))

0.8566285188125835
-2.039755857615953
0.9731578751057082

from sklearn.metrics import mean_squared_error, r2_score,

mean_absolute_error

df_LRprediction = df_LR.predict(Xtest)
df_MLPRprediction = df_MLPR.predict(Xtest)
df_RFPprediction = df_RFP.predict(Xtest)

LR_sq_error = mean_squared_error(df_LRprediction, Ytest,

sample_weight=None, multioutput='uniform_average', squared=False)/100
MLPR_sq_error = mean_squared_error(df_MLPRprediction, Ytest,
sample_weight=None, multioutput='uniform_average', squared=False)/100
RFP_sq_error = mean_squared_error(df_RFPprediction, Ytest,
sample_weight=None, multioutput='uniform_average', squared=False)/100

print(LR_sq_error)
print(MLPR_sq_error)
print(RFP_sq_error)

0.47116017305214347
2.6123827992669333
0.5203310294005739

LR_ab_error = mean_absolute_error(df_LRprediction, Ytest,

sample_weight=None, multioutput='uniform_average')/100
MLPR_ab_error = mean_absolute_error(df_MLPRprediction, Ytest,
sample_weight=None, multioutput='uniform_average')/100
RFP_ab_error = mean_absolute_error(df_RFPprediction, Ytest,
sample_weight=None, multioutput='uniform_average')/100

print(LR_ab_error)
print(MLPR_ab_error)
print(RFP_ab_error)

0.20850917487160747
0.7469482505523841
0.19447871723550342

Model_score = ('Linear Regression', 'Multi-layer Perceptron', 'Random

Forest')
Ypos_score = np.arange(len(Model_score))
Values = [df_LR.score(Xtrain, Ytrain),df_MLPR.score(Xtrain,
Ytrain),df_RFP.score(Xtrain, Ytrain)]

plt.bar(Ypos_score, Values, align='center', alpha=0.5)

plt.xticks(Ypos_score, Model_score)
plt.ylabel('Values')
plt.title('R-squared score comparison (higher is better)')
plt.ylim([0,1])
plt.show()

Model_sq_error = ('Linear Regression', 'Multi-layer Perceptron',

'Random Forest')
Ypos_sq_error = np.arange(len(Model_sq_error))
Values = [LR_sq_error,MLPR_sq_error,RFP_sq_error]

plt.bar(Ypos_sq_error, Values, align='center', alpha=0.5)

plt.xticks(Ypos_sq_error, Model_sq_error)
plt.ylabel('Values')
plt.title('Mean squared error comparison (lower is better)')
plt.ylim([0,1])
plt.show()
Model_ab_error = ('Linear Regression', 'Multi-layer Perceptron',
'Random Forest')
Ypos_ab_error = np.arange(len(Model_ab_error))
Values = [LR_ab_error,MLPR_ab_error,RFP_ab_error]

plt.bar(Ypos_ab_error, Values, align='center', alpha=0.5)

plt.xticks(Ypos_ab_error, Model_ab_error)
plt.ylabel('Values')
plt.title('Mean absolute comparison (lower is better)')
plt.ylim([0,1])
plt.show()
#Conclusion
But what do all of these numbers mean in our data evaluation? For the case of R-squared
Score Comparison, the one with the highest value will be more acceptable model to be
used. In this case, it was Random Forest model. For the case of the Mean Squared
Comparison and the Mean Absolute Comparison, the lowest value will be more acceptable
model to be used, which are Linear Regression for the squared, and Multi-Layer Perceptron
for the absolute. We can conclude that when it comes to analyzing the Spotify's Songs and
their corresponding number of Streams and Days, a data scientist can use either of the
three models presented in this project.

Project Spotify Haseeb
No ratings yet
Project Spotify Haseeb
37 pages
Spotify Analysis
No ratings yet
Spotify Analysis
3 pages
Music Recommendation System Guide
No ratings yet
Music Recommendation System Guide
22 pages
ML (Project) Merged
No ratings yet
ML (Project) Merged
16 pages
Naan Mudhalvan
No ratings yet
Naan Mudhalvan
4 pages
De CBP B3 Spotify
No ratings yet
De CBP B3 Spotify
11 pages
Spotify Analysis - 1
No ratings yet
Spotify Analysis - 1
2 pages
LBPL Project Report
No ratings yet
LBPL Project Report
14 pages
Spotify Streaming Trends Analysis
No ratings yet
Spotify Streaming Trends Analysis
6 pages
30000songs - Sets - Ipynb - Colaboratory
No ratings yet
30000songs - Sets - Ipynb - Colaboratory
11 pages
Predicting Song Popularity with Linear Regression
No ratings yet
Predicting Song Popularity with Linear Regression
14 pages
Association Rules
No ratings yet
Association Rules
8 pages
Spotify Analysis
No ratings yet
Spotify Analysis
1 page
Top Streamed Hits for Music Lovers
No ratings yet
Top Streamed Hits for Music Lovers
9 pages
Spotify
No ratings yet
Spotify
20 pages
Spotify 2023 Data Analysis Report
No ratings yet
Spotify 2023 Data Analysis Report
11 pages
PI Analysis
No ratings yet
PI Analysis
5 pages
Aneesha Big Data Project
No ratings yet
Aneesha Big Data Project
4 pages
Music Genre Classification Guide
No ratings yet
Music Genre Classification Guide
15 pages
C1M4 PracticeLab 1 Spotify Case Study Attachment
No ratings yet
C1M4 PracticeLab 1 Spotify Case Study Attachment
11 pages
The Ultimate SQL Guide
No ratings yet
The Ultimate SQL Guide
6 pages
Spotify Data Analysis SQL Project 1712710947
No ratings yet
Spotify Data Analysis SQL Project 1712710947
23 pages
Harsh Veer Python Project
No ratings yet
Harsh Veer Python Project
20 pages
Exploring Spotifys Music Popularity Dynamics and
No ratings yet
Exploring Spotifys Music Popularity Dynamics and
7 pages
Internet News Data With Readers Engagement
No ratings yet
Internet News Data With Readers Engagement
3 pages
Music Popularity Prediction Through Data Analysis
No ratings yet
Music Popularity Prediction Through Data Analysis
6 pages
Analyze Music Streaming Patterns Using Data
No ratings yet
Analyze Music Streaming Patterns Using Data
11 pages
Spotify
No ratings yet
Spotify
924 pages
Codes H
No ratings yet
Codes H
3 pages
Ip Project
No ratings yet
Ip Project
20 pages
Project IP Coding
No ratings yet
Project IP Coding
2 pages
1 Rajma 100 12 2 Chole 150 15 3 Burger 200 16 4 Pizza 500 17
No ratings yet
1 Rajma 100 12 2 Chole 150 15 3 Burger 200 16 4 Pizza 500 17
571 pages
Data Cleaning
No ratings yet
Data Cleaning
4 pages
Spotify Data Explaination
No ratings yet
Spotify Data Explaination
2 pages
Music Streaming Sessions Dataset Schema
No ratings yet
Music Streaming Sessions Dataset Schema
4 pages
Project 2 Spotify
No ratings yet
Project 2 Spotify
2 pages
Predicting Churn in Sparkify Users
No ratings yet
Predicting Churn in Sparkify Users
30 pages
R Assignment
No ratings yet
R Assignment
32 pages
Spotify Stats - View Your Spotify Statistics
No ratings yet
Spotify Stats - View Your Spotify Statistics
1 page
Predicting Billboard Top 10 Song Popularity
No ratings yet
Predicting Billboard Top 10 Song Popularity
3 pages
Spotify Dataset
No ratings yet
Spotify Dataset
4,607 pages
Spotify Playlist Analyzer & Organizer Find Genres, Moods and Statistics
No ratings yet
Spotify Playlist Analyzer & Organizer Find Genres, Moods and Statistics
1 page
9520 - Kmeans - Ipynb - Colab
No ratings yet
9520 - Kmeans - Ipynb - Colab
3 pages
Ex9 T4 Ferrer
No ratings yet
Ex9 T4 Ferrer
8 pages
Music Data Analysis with R Packages
No ratings yet
Music Data Analysis with R Packages
76 pages
Spotify Analysis
No ratings yet
Spotify Analysis
9 pages
Global Digital Artist Ranking
No ratings yet
Global Digital Artist Ranking
1 page
Spotify Song Cohort Analysis
No ratings yet
Spotify Song Cohort Analysis
5 pages
Billboard Top 100 1943 Songs Artists Only One
No ratings yet
Billboard Top 100 1943 Songs Artists Only One
23 pages
Buy Me A Co Ee, mp3, or 45: Singles Chart For 2024-11-30
No ratings yet
Buy Me A Co Ee, mp3, or 45: Singles Chart For 2024-11-30
2 pages
Spotify Final Research Report
No ratings yet
Spotify Final Research Report
99 pages
Final Project Presentation 2
No ratings yet
Final Project Presentation 2
15 pages
Dataset SPOTIFY Rapidmaner Dawam
No ratings yet
Dataset SPOTIFY Rapidmaner Dawam
6,022 pages
Polyalk WP PDF
No ratings yet
Polyalk WP PDF
2 pages
"Cromwell's Defiance at Camp Somerset"
No ratings yet
"Cromwell's Defiance at Camp Somerset"
6 pages
Composite Materials: Asst - Prof. Dr. Ayşe KALEMTAŞ
No ratings yet
Composite Materials: Asst - Prof. Dr. Ayşe KALEMTAŞ
32 pages
Well Operations Management Plan
100% (1)
Well Operations Management Plan
131 pages
Nokia Antenna Solutions Customer Presentation v4.0 (Long Version)
No ratings yet
Nokia Antenna Solutions Customer Presentation v4.0 (Long Version)
50 pages
Differential Geometry in Physics 1st Edition Gabriel Lugo - Download The Ebook Now For Instant Access To All Chapters
100% (1)
Differential Geometry in Physics 1st Edition Gabriel Lugo - Download The Ebook Now For Instant Access To All Chapters
84 pages
Brochure GM IM Horizontal Scourer MHXS EN 2018
No ratings yet
Brochure GM IM Horizontal Scourer MHXS EN 2018
4 pages
211-02 Power Steering 2016 Mustang Description and Operation
No ratings yet
211-02 Power Steering 2016 Mustang Description and Operation
56 pages
Developing A Tool For Designing A Container Terminal Yard
No ratings yet
Developing A Tool For Designing A Container Terminal Yard
108 pages
Activity 4. Lab Report
No ratings yet
Activity 4. Lab Report
8 pages
Hypertension: A Deadly Epidemic in Africa
100% (3)
Hypertension: A Deadly Epidemic in Africa
2 pages
Quinolones, Including Fluoroquinolones, Use in Animals - Pharmacology - MSD Veterinary Manual2
No ratings yet
Quinolones, Including Fluoroquinolones, Use in Animals - Pharmacology - MSD Veterinary Manual2
3 pages
3412C 725 Ekw/ 906 kVA/ 60 HZ/ 1800 RPM/ 480 V/ 0.8 Power Factor
No ratings yet
3412C 725 Ekw/ 906 kVA/ 60 HZ/ 1800 RPM/ 480 V/ 0.8 Power Factor
3 pages
Alternator Circuit
No ratings yet
Alternator Circuit
0 pages
OTTO Cycle Presentation
50% (4)
OTTO Cycle Presentation
97 pages
Sbeg 1473 Photogrammetry and Remote Sensing LAB02
100% (1)
Sbeg 1473 Photogrammetry and Remote Sensing LAB02
12 pages
Arizona Aquatic Invasive Species Plan
No ratings yet
Arizona Aquatic Invasive Species Plan
48 pages
KSB Pump Catalog Etaline Syt
No ratings yet
KSB Pump Catalog Etaline Syt
26 pages
Esas Review
No ratings yet
Esas Review
48 pages
Solar Power Plant Report
No ratings yet
Solar Power Plant Report
4 pages
CH 4 Heredity Question Bank
No ratings yet
CH 4 Heredity Question Bank
2 pages
145
No ratings yet
145
8 pages
EHV AC Transmission
No ratings yet
EHV AC Transmission
14 pages
Mass Customization Strategies at HP
No ratings yet
Mass Customization Strategies at HP
3 pages
English Grammar Workbook For IELTS
100% (10)
English Grammar Workbook For IELTS
216 pages
Volvo Yv1uz68tcj1103892 2023-02-08034105PM
No ratings yet
Volvo Yv1uz68tcj1103892 2023-02-08034105PM
3 pages
ISO-7547-2002 Incompleto
No ratings yet
ISO-7547-2002 Incompleto
9 pages
Unlocking The Science of Dreams
No ratings yet
Unlocking The Science of Dreams
16 pages
Air Pollution Essay For Class 10
100% (1)
Air Pollution Essay For Class 10
13 pages
Architectural Stone Use Guide
No ratings yet
Architectural Stone Use Guide
19 pages

DS Final Project PDF

Uploaded by

DS Final Project PDF

Uploaded by

Data Gathering

from google.colab import files

Saving Spotify_final_dataset.csv to Spotify_final_dataset.csv

Position Artist Name Song Name

Top 10 (xTimes) Peak Position Peak Position (xTimes) Peak Streams

1 178.0 1 (x20) 2127668

2 212.0 1 (x4) 1660502

4 186.0 1 (x124) 2905678

Dropping the irrelevant columns

Position Artist Name \

Song Name Days Total Streams

1 Lucid Dreams 1673 864832399

2 XO TOUR Llif3 1853 781153024

3 No Role Modelz 2547 734857487

4 rockstar 1223 718865961

... ... ... ...

11079 If I Die Young 1 51321

11080 Not a Bad Thing 1 49512

11082 Somebody To You 1 44962

11083 Z Holy Grail 1 44323

[11084 rows x 5 columns]

Checking for null values

Position Artist Name Song Name Days Total Streams

Jenny Duncan has occurred only once.

Double check if there are still null rows

Exploratory Data Analysis

sns.pairplot(df, x_vars=['Days'], y_vars=['Total Streams'], height=10)

Position Artist Name Song Name

[11084 rows x 1 columns]

[8867 rows x 1 columns]

[8867 rows x 1 columns]

[2217 rows x 1 columns]

[2217 rows x 1 columns]

from sklearn.metrics import mean_squared_error, r2_score,

LR_sq_error = mean_squared_error(df_LRprediction, Ytest,

LR_ab_error = mean_absolute_error(df_LRprediction, Ytest,

Model_score = ('Linear Regression', 'Multi-layer Perceptron', 'Random

plt.bar(Ypos_score, Values, align='center', alpha=0.5)

Model_sq_error = ('Linear Regression', 'Multi-layer Perceptron',

plt.bar(Ypos_sq_error, Values, align='center', alpha=0.5)

plt.bar(Ypos_ab_error, Values, align='center', alpha=0.5)

You might also like