Exploratory Data Analysis for Top 50 Spotify Songs in Python
Case Study: Top 50 Spotify Songs — 2019.
Introduction of Exploratory Data Analysis (EDA)
Exploratory Data Analysis refers to the critical process of performing initial investigations on data so as to discover patterns, to spot anomalies,to test hypothesis and to check assumptions with the help of summary statistics and graphical representations. (Source: https://towardsdatascience.com/exploratory-data-analysis-8fc1cb20fd15)
Dataset:
Download the dataset from kaggle (url: https://www.kaggle.com/leonardopena/top50spotify2019)
Steps…
First, download and install some Packages in Python. Maybe you need a few minutes to import the packages.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="ticks", color_codes=True)
sns.set(style="darkgrid")
import statistics as stat
import plotly.express as px
Import the Dataset.
spotify= pd.read_csv("top50.csv", encoding="ISO-8859-1")
spotify.head()
#shape function is used for checking the data size i.e length and width of data
Shape function is used for checking the data size i.e length and width of data
spotify.shape
Check the datatypes of the predictor variables
spotify.info()
Using integer variables
spotify_int = spotify.iloc[:, 4:14]
spotify_int.head()
display descriptive statistics from the dataset
spotify_int.describe()
Plot correlation of all integer variables.
sns.pairplot(spotify_int);
shows bar gra aph of Popularity and Track Name
spotify.plot(y='Popularity',x= 'Track.Name',kind='bar',figsize=(26,6),legend =True,title="Popularity Vs Track Name",
fontsize=18,stacked=True,color=['y', 'r', 'b','y', 'r', 'b', 'y'])
plt.ylabel('Popularity', fontsize=18)
plt.xlabel('Track Name', fontsize=18)
plt.show()
Count of artist name
plt.figure(figsize=(10,10))
sns.countplot(y='Artist.Name', data=spotify, order=spotify["Artist.Name"].value_counts().index)
plt.show()
Count by Genre
spotify['Genre'].value_counts().plot.bar()
plt.title('Count by Genre')
plt.ylabel('quanity')
plt.show()
print(spotify.groupby('Genre').size())
Create wordcloud based on music genre
from wordcloud import WordCloud, STOPWORDS
# Create the wordcloud object
wordcloud = WordCloud(width=700, height=600, margin=3).generate(str(spotify.Genre))# Display the generated image:
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
Visualization of relationship between genre and popularity using SwarmPlot
plt.figure(figsize=(10,5))
swarmplot=sns.swarmplot(x="Genre",y="Popularity",data=spotify, s=13)
swarmplot.set_xticklabels(swarmplot.get_xticklabels(),rotation=90)
swarmplot.set_title("Relationship between Genre & Popularity")
Visualization the relationship between Beats Per Minute and artists based on Popularity
sns.catplot(x="Artist.Name", y="Beats.Per.Minute",hue="Popularity", s=15,data=artist, kind="swarm")
Box plot of the relationship between Loudness..dB..and Energy
sns.catplot(x = "Loudness..dB..", y = "Energy", kind = "box", data = spotify)
Spearman correlation statistics for all integer variables
pd.set_option('precision', 3)
corr = spotify.corr(method='spearman')
print(corr)
Marginal plot between Acousticness and Beat Per Minute.
sns.jointplot(x="Beats.Per.Minute", y="Acousticness..", data=spotify, kind="kde");
Script for filter several artists
artist = spotify[spotify[“Artist.Name”].isin([“Ed Sheeran”, “J Balvin”, “Ariana Grande”, “Marshmello”, “The Chainsmokers”, “Shawn Mendes”])]
okay maybethats all about EDA discussion. These are just a few visualizations of op 50 Spotify Songs information, there is much more that can be explored more deeply.
THX :)