Lecture 5
University of Arizona
INFO 511 - Spring 2025
Discuss the following for the visualization in the #lecture-discussions Slack Channel.
plt.figure(figsize=(8, 6))
sns.violinplot(x="species", y="body_mass_g", data=penguins)
sns.stripplot(x="species", y="body_mass_g", data=penguins, jitter=True, hue='species')
plt.title('Violin Plot with Jittered Points, Color by Species, and No Legend')
plt.figure(figsize=(8, 6))
sns.violinplot(x="species", y="body_mass_g", data=penguins, palette='colorblind')
sns.stripplot(x="species", y="body_mass_g", data=penguins, jitter=True, hue='species', palette='colorblind')
plt.title('Violin Plot with Jittered Points, Color by Species, No Legend, and Colorblind Palette')
What do these three plots show?
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from palmerpenguins import load_penguins
penguins = load_penguins()
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year | |
0 | Adelie | Torgersen | 39.1 | 18.7 | 181.0 | 3750.0 | male | 2007 |
1 | Adelie | Torgersen | 39.5 | 17.4 | 186.0 | 3800.0 | female | 2007 |
2 | Adelie | Torgersen | 40.3 | 18.0 | 195.0 | 3250.0 | female | 2007 |
3 | Adelie | Torgersen | NaN | NaN | NaN | NaN | NaN | 2007 |
4 | Adelie | Torgersen | 36.7 | 19.3 | 193.0 | 3450.0 | female | 2007 |
Analyzing a single variable:
Numerical: histogram, box plot, density plot, etc.
Categorical: bar plot, pie chart, etc.
Analyzing the relationship between two variables:
Numerical + numerical: scatterplot
Numerical + categorical: side-by-side box plots, violin plots, etc.
Categorical + categorical: stacked bar plots
Using an aesthetic (e.g., fill, color, shape, etc.) or facets to represent the second variable in any plot
The Bechdel test also known as the Bechdel-Wallace test, is a measure of the representation of women in film and other fiction. The test asks whether a work features at least two female characters who have a conversation about something other than a man. Some versions of the test also require that those two female characters have names.
Load the Bechdel test data with pd.read_csv()
the .columns
names of the bechdel
What about this plot makes it difficult to evaluate how ROI varies by Bechdel test result?
What are the movies with highest ROI?
high_roi_movies = bechdel[bechdel['roi'] > 400][['title', 'roi', 'budget_2013', 'gross_2013', 'year', 'clean_test']]
title roi budget_2013 gross_2013 year \
703 Paranormal Activity 671.336857 505595 339424558.0 2007
1319 The Blair Witch Project 648.065333 839077 543776715.0 1999
1575 El Mariachi 583.285665 11622 6778946.0 1992
703 dubious
1319 ok
1575 nowomen
Zoom in: What about this plot makes it difficult to evaluate how ROI varies by Bechdel test result?
to next week’s data wrangling pipelines…
What does this plot say about return-on-investment on movies that pass the Bechdel test?
plt.figure(figsize=(8, 4))
sns.boxplot(x='roi', y='clean_test', hue='binary', data=bechdel)
plt.axvline(x=4.21, color='red', linestyle='--')
plt.xlim(0, 15)
plt.title('ROI by Bechdel Test Result with Median Line')
plt.ylabel('Bechdel Test Result')