Coding Session: Investigating guest stars in The Office series
The Office series is my favorite series and I can't obviously resist anything about it, let’s not even talk about doing some data analysis about this. I did a little bit of research about the guest stars’ presence at this series and their effect on the viewership of the various episodes. This project is a part of the DataCamp skill track for Data Science. Let’s see the project statement first.
Investigating Netflix Movies
The Office! What started as a British mockumentary series about office culture in 2001 has since spawned ten other variants across the world, including an Israeli version (2010-13), a Hindi version (2019-), and even a French Canadian variant (2006-2007). Of all these iterations (including the original), the American series has been the longest-running, spanning 201 episodes over nine seasons.
In this notebook, we will look at a dataset of The Office episodes and try to understand how the popularity and quality of the series varied over time. To do so, we will use the following dataset: datasets/office_episodes.csv
, which was downloaded from Kaggle here.
This dataset contains information on a variety of characteristics of each episode. In detail, these are:
Data visualization is often a great way to explore your data and uncover insights. In this notebook, you will initiate this process by creating an informative plot of the episode data provided to you. In doing so, you're going to work on several different variables, including the episode number, the viewership, the fan rating, and guest appearances. Here are the requirements needed to pass this project:
- Create a
matplotlib
scatter plot of the data that contains the following attributes: - Each episode's episode number is plotted along the x-axis
- Each episode's viewership (in millions) plotted along the y-axis
- A color scheme reflecting the scaled ratings (not the regular ratings) of each episode, such that:
- Ratings < 0.25 are colored
"red"
- Ratings >= 0.25 and < 0.50 are colored
"orange"
- Ratings >= 0.50 and < 0.75 are colored
"lightgreen"
- Ratings >= 0.75 are colored
"darkgreen"
- A sizing system, such that episodes with guest appearances have a marker size of
250
and episodes without are sized25
- A title, reading
"Popularity, Quality, and Guest Appearances on the Office"
- An x-axis label reading
"Episode Number"
- A y-axis label reading
"Viewership (Millions)"
- Provide the name of one of the guest stars (hint, there were multiple!) who was in the most watched Office episode. Save it as a string in the variable
top_star
(e.g.top_star = "Will Ferrell"
).
Solution
import pandas as pd
import matplotlib.pyplot as plt
#import the file and define the scaled rating
theoffice_raw=pd.read_csv("~/the_office_series.csv")
theoffice_db=theoffice_raw.fillna(0)
max_score=theoffice_db['Ratings'].max()
min_score=theoffice_db['Ratings'].min()
theoffice_db['ScaledRatings']=((theoffice_db['Ratings']-min_score)/
(max_score-min_score)*1)
print (theoffice_db.head())
#define the marker colors
colors = []
for lab,row in theoffice_db.iterrows():
if row['ScaledRatings']<0.25:
colors.append("red")
elif row['ScaledRatings']>=0.25 and row['ScaledRatings']<0.5:
colors.append("orange")
elif row['ScaledRatings']>=0.5 and row['ScaledRatings']<0.75:
colors.append("lightgreen")
else:
colors.append("darkgreen")
#define the marker size
size = []
for lab,row in theoffice_db.iterrows() :
if row['GuestStars']==0:
size.append(25)
else:
size.append(250)
top_star=theoffice_db.loc[
theoffice_db['Viewership'].idxmax(),
'GuestStars']
print (top_star)
#create the plot
fig = plt.figure()
plt.figure(figsize=(12,8))
plt.scatter(
x = theoffice_db.iloc[:,0],
y = theoffice_db['Viewership'],
s=size,
c=colors,
alpha=0.9,
edgecolors="white",
linewidth=1
)
plt.xlabel("Episode Number")
plt.ylabel("Viewership (Millions)")
plt.title("Popularity, Quality, and Guest Appearances on the Office")
Summary
This was a basic project that I did as a part of a data science course on DataCamp and I wanted to share its solution with those who started to learn Python basics.
© Made with Notion + Super