Intro to Numpy + Pandas

Lecture 3

Dr. Greg Chism

University of Arizona
INFO 511 - Fall 2024

Intro to NumPy

What is NumPy?

  • NumPy = Numerical Python

  • Foundational package for scientific computing

  • High-performance multidimensional arrays

  • Tools for working with arrays

Start with

pip install numpy

Why NumPy for data science?

  • Essential for data processing, manipulation, and analysis.

  • Underpins advanced data science algorithms implemented in Python.

  • Fast and memory-efficient with powerful data structures.

NumPy Arrays

Creating arrays

import numpy as np

# Creating a simple NumPy array
arr = np.array([1, 2, 3, 4])

# Multidimensional array
multi_arr = np.array([[1, 2, 3], [4, 5, 6]])

# Range of values
range_arr = np.arange(10)

# Array of zeros
zeros_arr = np.zeros((3, 3))

# Array of ones
ones_arr = np.ones((2, 2))

# Identity matrix
identity_matrix = np.eye(3)
arr: [1 2 3 4] 

multi_arr: [[1 2 3]
 [4 5 6]] 

range_arr: [0 1 2 3 4 5 6 7 8 9] 

zeros_arr: [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]] 

ones_arr: [[1. 1.]
 [1. 1.]] 

identity_matrix: [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

Array attributes

# Array dimensions
print("Dimensions:", multi_arr.ndim)

# Shape of array
print("Shape:", multi_arr.shape)

# Size of array
print("Size:", multi_arr.size)

# Data type of array elements
print("Data Type:", multi_arr.dtype)
Dimensions: 2
Shape: (2, 3)
Size: 6
Data Type: int64

Array operations

Arithmetic operations

# Element-wise addition
addition = arr + 2

# Element-wise subtraction
subtraction = arr - 2

# Element-wise multiplication
multiplication = arr * 2

# Element-wise division
division = arr / 2
addition: [3 4 5 6] 

subtraction: [-1  0  1  2] 

multiplication: [2 4 6 8] 

division: [0.5 1.  1.5 2. ] 

Aside

Why do my outputs look than different than Python?

print(addition)
print(subtraction)
print(multiplication)
print(division)
[3 4 5 6]
[-1  0  1  2]
[2 4 6 8]
[0.5 1.  1.5 2. ]
print("addition:", addition, "\n")

print("subtraction:", subtraction, "\n")

print("multiplication:", multiplication, "\n")

print("division:", division, "\n")
addition: [3 4 5 6] 

subtraction: [-1  0  1  2] 

multiplication: [2 4 6 8] 

division: [0.5 1.  1.5 2. ] 

Statistical operations

# Sum of elements
total = arr.sum()

# Mean of elements
mean_value = arr.mean()

# Standard deviation
std_dev = arr.std()

# Correlation coefficient
corr = np.corrcoef(multi_arr)
total: 10 

mean_value: 2.5 

std_dev: 1.118033988749895 

corr: [[1. 1.]
 [1. 1.]] 

Advanced operations

Reshaping and transposing

# Reshaping an array
reshaped = np.reshape(range_arr, (2, 5))

# Transpose of an array
transposed = multi_arr.T
range_arr: [0 1 2 3 4 5 6 7 8 9] 

reshaped: [[0 1 2 3 4]
 [5 6 7 8 9]] 

multi_arr: [[1 2 3]
 [4 5 6]] 

transposed: [[1 4]
 [2 5]
 [3 6]] 

Indexing and slicing

# Accessing a specific element
element = multi_arr[0, 1]

# Slicing a row
row = multi_arr[1, :]

# Slicing a column
column = multi_arr[:, 2]
multi_arr: [[1 2 3]
 [4 5 6]] 

element: [[1 2 3]
 [4 5 6]] 

row: [4 5 6] 

column: [3 6] 

Broadcasting

# Broadcasting allows arithmetic operations on arrays of different sizes
broadcasted_addition = multi_arr + np.array([1, 0, 1])
multi_arr: [[1 2 3]
 [4 5 6]] 

broadcasted_addition: [[2 2 4]
 [5 5 7]] 

Linear algebra in NumPy

Matrix operations

Dot product: take two equal-length sequences and return a single number

2 • (1, 2, 3) = 2x1 = 2; 2x2 = 4; 2x3 = 6

Matrix multiplication:

(1, 2, 3) • (7, 9, 11) = (1×7 + 2×9 + 3×11) = 58

# Dot product
dot_product = np.dot(arr, arr)

# Matrix multiplication
matrix_mul = np.dot(multi_arr, identity_matrix)
dot_product: 30 

matrix_mul: [[1. 2. 3.]
 [4. 5. 6.]] 

Eigenvalues and Eigenvectors

# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(identity_matrix)
eigenvalues: [1. 1. 1.] 

eigenvectors: [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]] 

NumPy for data science

Application in Algorithms

  • NumPy arrays are used in various data science algorithms like clustering, classification, and neural networks.

Performance

  • NumPy operations are implemented in C, which makes them much faster than standard Python.

Conclusion

  • NumPy is integral to data science and analysis in Python.

  • It provides efficient and fast operations for array and matrix manipulation.

  • Understanding NumPy is crucial for implementing and customizing data science algorithms.

Introduction to

What is Pandas?

  • High-Performance Library: Pandas is a Python library for fast data manipulation.

  • Core Structures: It introduces DataFrame and Series for data handling.

  • Data Processing: Ideal for cleaning and analyzing datasets.

  • Versatile I/O: Offers extensive file format compatibility for data I/O.

Why Pandas for data mining?

  • Streamlines Data Prep: Optimizes data manipulation for mining readiness.

  • Built-in Analysis: Includes essential tools for quick data exploration.

  • Handles Large Data: Efficiently processes and analyzes big datasets.

Pandas data structures

Series

One-dimensional array-like object containing a sequence of values with an associated array of labels, its index.

import pandas as pd

# Creating a Series
ser = pd.Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c'])

# Accessing elements
print("a:", ser['a'])
a: -5

DataFrame

A rectangular table of data with an ordered collection of columns

# Creating a DataFrame
data = {
    'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
    'year': [2000, 2001, 2002, 2001, 2002],
    'pop': [1.5, 1.7, 3.6, 2.4, 2.9]
}
frame = pd.DataFrame(data)

# Selecting columns
print(frame['state'])
0      Ohio
1      Ohio
2      Ohio
3    Nevada
4    Nevada
Name: state, dtype: object

Index objects

Immutable, can’t be modified by a user

# Index objects
obj = pd.Series(range(3), index=['a', 'b', 'c'])
index = obj.index

print(index)
Index(['a', 'b', 'c'], dtype='object')

Case study: average rent costs

💰📈

Reading & Inspecting Data

First, why .CSVs?

  • More reproducible - can see changes on GitHub

  • Simple file structure

  • Standardized

  • Non-proprietary (e.g., Excel)

Reading in .CSV files

# Loading data from CSV
df = pd.read_csv('data/rent_avg.csv')

df
RegionID SizeRank RegionName RegionType StateName 1/31/15 2/28/15 3/31/15 4/30/15 5/31/15 ... 2/28/23 3/31/23 4/30/23 5/31/23 6/30/23 7/31/23 8/31/23 9/30/23 10/31/23 11/30/23
0 102001 0 United States country NaN 1266.059583 1272.748070 1281.390109 1291.808026 1301.544232 ... 2072.346516 2084.944938 2100.570959 2113.158286 2123.032953 2132.040398 2141.677753 2147.835795 2148.939168 2147.563754
1 394913 1 New York, NY msa NY 2233.133615 2255.035180 2272.077073 2291.645864 2297.479956 ... 3336.961840 3402.179034 3470.949450 3492.930681 3502.786897 3503.650740 3509.480232 3493.537322 3460.942251 3445.696877
2 753899 2 Los Angeles, CA msa CA 2571.296547 2586.050819 2604.348963 2616.104497 2637.303435 ... 4073.810818 4100.234089 4134.999768 4148.832674 4182.522537 4202.376930 4230.003153 4224.038689 4213.465460 4209.636932
3 394463 3 Chicago, IL msa IL 1504.096116 1510.879827 1522.416987 1534.343702 1547.516576 ... 2112.639599 2123.617285 2145.692631 2163.843271 2183.541315 2196.506806 2213.427631 2225.237153 2225.798038 2221.960828
4 394514 4 Dallas, TX msa TX 1363.557414 1371.136919 1381.114797 1394.643185 1408.025840 ... 2222.442779 2229.474165 2239.462332 2253.004851 2267.408242 2280.056798 2285.063932 2284.569053 2274.785931 2277.403767
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
467 753871 811 Breckenridge, CO msa CO NaN NaN NaN NaN NaN ... 3848.649613 4327.112762 4414.482838 4438.619592 4535.664564 4599.237795 4558.170264 4667.043338 4671.081209 4472.222222
468 394751 821 Kirksville, MO msa MO NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN 1000.733430 981.701456 951.458333
469 753923 849 The Dalles, OR msa OR NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN 1757.739178 1788.943167 1885.904065 1838.888889
470 394584 863 Fallon, NV msa NV NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 1295.000000
471 394996 915 Portales, NM msa NM NaN NaN NaN NaN NaN ... NaN NaN NaN NaN 1122.195307 1101.628147 1057.654564 1071.572024 1046.671512 1099.305556

472 rows × 112 columns

Inspecting data

Any issues?

df.head()
RegionID SizeRank RegionName RegionType StateName 1/31/15 2/28/15 3/31/15 4/30/15 5/31/15 ... 2/28/23 3/31/23 4/30/23 5/31/23 6/30/23 7/31/23 8/31/23 9/30/23 10/31/23 11/30/23
0 102001 0 United States country NaN 1266.059583 1272.748070 1281.390109 1291.808026 1301.544232 ... 2072.346516 2084.944938 2100.570959 2113.158286 2123.032953 2132.040398 2141.677753 2147.835795 2148.939168 2147.563754
1 394913 1 New York, NY msa NY 2233.133615 2255.035180 2272.077073 2291.645864 2297.479956 ... 3336.961840 3402.179034 3470.949450 3492.930681 3502.786897 3503.650740 3509.480232 3493.537322 3460.942251 3445.696877
2 753899 2 Los Angeles, CA msa CA 2571.296547 2586.050819 2604.348963 2616.104497 2637.303435 ... 4073.810818 4100.234089 4134.999768 4148.832674 4182.522537 4202.376930 4230.003153 4224.038689 4213.465460 4209.636932
3 394463 3 Chicago, IL msa IL 1504.096116 1510.879827 1522.416987 1534.343702 1547.516576 ... 2112.639599 2123.617285 2145.692631 2163.843271 2183.541315 2196.506806 2213.427631 2225.237153 2225.798038 2221.960828
4 394514 4 Dallas, TX msa TX 1363.557414 1371.136919 1381.114797 1394.643185 1408.025840 ... 2222.442779 2229.474165 2239.462332 2253.004851 2267.408242 2280.056798 2285.063932 2284.569053 2274.785931 2277.403767

5 rows × 112 columns

df.tail()
RegionID SizeRank RegionName RegionType StateName 1/31/15 2/28/15 3/31/15 4/30/15 5/31/15 ... 2/28/23 3/31/23 4/30/23 5/31/23 6/30/23 7/31/23 8/31/23 9/30/23 10/31/23 11/30/23
467 753871 811 Breckenridge, CO msa CO NaN NaN NaN NaN NaN ... 3848.649613 4327.112762 4414.482838 4438.619592 4535.664564 4599.237795 4558.170264 4667.043338 4671.081209 4472.222222
468 394751 821 Kirksville, MO msa MO NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN 1000.733430 981.701456 951.458333
469 753923 849 The Dalles, OR msa OR NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN 1757.739178 1788.943167 1885.904065 1838.888889
470 394584 863 Fallon, NV msa NV NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 1295.000000
471 394996 915 Portales, NM msa NM NaN NaN NaN NaN NaN ... NaN NaN NaN NaN 1122.195307 1101.628147 1057.654564 1071.572024 1046.671512 1099.305556

5 rows × 112 columns

df.dtypes
RegionID        int64
SizeRank        int64
RegionName     object
RegionType     object
StateName      object
               ...   
7/31/23       float64
8/31/23       float64
9/30/23       float64
10/31/23      float64
11/30/23      float64
Length: 112, dtype: object
df.describe()
RegionID SizeRank 1/31/15 2/28/15 3/31/15 4/30/15 5/31/15 6/30/15 7/31/15 8/31/15 ... 2/28/23 3/31/23 4/30/23 5/31/23 6/30/23 7/31/23 8/31/23 9/30/23 10/31/23 11/30/23
count 472.000000 472.000000 170.000000 173.000000 177.000000 179.000000 179.000000 178.000000 179.000000 180.000000 ... 346.000000 361.000000 372.000000 384.000000 406.000000 411.000000 415.000000 425.000000 445.000000 472.000000
mean 415298.207627 273.686441 1239.825620 1240.669841 1251.146830 1266.885696 1276.013223 1284.256611 1290.350280 1291.137595 ... 1903.384062 1898.742328 1908.427805 1906.267933 1886.896644 1892.888767 1897.265795 1891.438391 1877.721153 1851.628367
std 89315.652491 192.924182 413.889910 413.782914 417.808376 430.279161 436.704575 443.645583 446.426265 448.112021 ... 1004.878275 984.054607 993.950964 981.101384 943.047137 925.105623 936.407031 974.805502 973.269750 935.988332
min 102001.000000 0.000000 618.854999 621.850858 634.040448 633.276802 623.165150 624.515366 625.812267 645.063429 ... 675.528618 753.200396 723.441350 745.512901 748.650235 727.708028 724.647895 726.496330 743.350970 752.666667
25% 394560.500000 118.750000 982.245085 986.598979 993.390743 998.496778 999.149213 999.166349 1002.455958 1006.227359 ... 1411.767065 1409.322755 1426.260589 1418.156725 1410.428037 1418.889164 1417.539001 1414.537646 1406.329003 1397.858135
50% 394805.500000 241.500000 1119.640946 1128.291306 1140.587934 1149.357569 1154.063529 1160.902744 1167.827881 1173.401059 ... 1725.072590 1707.075023 1707.509644 1701.364516 1700.414955 1706.087275 1716.116918 1708.640942 1696.614785 1675.149130
75% 395063.500000 393.250000 1338.069919 1342.565444 1365.299281 1379.799386 1396.227619 1395.909341 1407.885537 1406.385945 ... 2153.244218 2139.535628 2167.521817 2157.862768 2130.928236 2153.958642 2174.036097 2151.869189 2148.939168 2118.656793
max 845167.000000 915.000000 3079.176287 3096.936684 3120.952116 3176.462957 3249.296472 3318.678095 3347.010915 3354.373564 ... 15718.660650 15404.494040 15781.904020 15583.894210 14849.608200 14344.093470 14754.112810 16015.296260 16415.672810 15918.888890

8 rows × 109 columns

We can also extract specific summary stats

min_value = df['11/30/23'].min()
max_value = df['11/30/23'].max()
mean_value = df['11/30/23'].mean()
med_value = df['11/30/23'].median()
std_value = df['11/30/23'].std()
count_value = df['11/30/23'].count()
min: 752.6666667
max: 15918.88889
mean: 1851.6283666211866
median: 1675.1491305
st. dev: 935.9883320322535
N: 472
pd.unique(df['StateName'])
array([nan, 'NY', 'CA', 'IL', 'TX', 'VA', 'PA', 'FL', 'GA', 'MA', 'AZ',
       'MI', 'WA', 'MN', 'CO', 'MD', 'MO', 'NC', 'OR', 'OH', 'NV', 'IN',
       'TN', 'RI', 'WI', 'OK', 'KY', 'LA', 'UT', 'CT', 'AL', 'HI', 'NE',
       'SC', 'NM', 'ID', 'AR', 'IA', 'KS', 'MS', 'ME', 'NH', 'DE', 'AK',
       'NJ', 'SD', 'WV', 'ND', 'VT', 'MT', 'WY'], dtype=object)

Melting

Jumping ahead slightly…

df2 = df.melt(id_vars = df.columns[0:5], var_name = "date", value_name = "avg_price")
df2.head()
RegionID SizeRank RegionName RegionType StateName date avg_price
0 102001 0 United States country NaN 1/31/15 1266.059583
1 394913 1 New York, NY msa NY 1/31/15 2233.133615
2 753899 2 Los Angeles, CA msa CA 1/31/15 2571.296547
3 394463 3 Chicago, IL msa IL 1/31/15 1504.096116
4 394514 4 Dallas, TX msa TX 1/31/15 1363.557414

Convert to datetime

df2['date'] = pd.to_datetime(df2['date'])
df2.head()
RegionID SizeRank RegionName RegionType StateName date avg_price
0 102001 0 United States country NaN 2015-01-31 1266.059583
1 394913 1 New York, NY msa NY 2015-01-31 2233.133615
2 753899 2 Los Angeles, CA msa CA 2015-01-31 2571.296547
3 394463 3 Chicago, IL msa IL 2015-01-31 1504.096116
4 394514 4 Dallas, TX msa TX 2015-01-31 1363.557414

Groups

Group data

grouped_df = df2.groupby('StateName')
print(type(grouped_df))
<class 'pandas.core.groupby.generic.DataFrameGroupBy'>

Grouped statistics

# Summary statistics for all numeric columns by sex
grouped_df.describe()
RegionID SizeRank ... date avg_price
count mean min 25% 50% 75% max std count mean ... max std count mean min 25% 50% 75% max std
StateName
AK 214.0 394453.500000 394327.0 394327.00 394453.5 394580.00 394580.0 126.796601 214.0 273.000000 ... 2023-11-30 00:00:00 NaN 118.0 1845.229913 1634.802345 1709.827198 1751.882085 1959.080017 2281.395586 180.825268
AL 1605.0 448643.533333 394333.0 394519.00 394598.0 395145.00 845163.0 138679.865228 1605.0 296.066667 ... 2023-11-30 00:00:00 NaN 837.0 1196.697249 830.963294 971.353373 1190.838894 1347.038333 2032.164480 254.603330
AR 749.0 394790.428571 394590.0 394609.00 394728.0 395042.00 395077.0 182.464426 749.0 285.142857 ... 2023-11-30 00:00:00 NaN 354.0 1095.547769 726.685275 940.461179 1068.180400 1253.028677 1697.220290 228.390724
AZ 856.0 451262.875000 394595.0 394923.25 395096.5 395187.00 845160.0 148966.294950 856.0 214.375000 ... 2023-11-30 00:00:00 NaN 502.0 1557.286231 975.778114 1239.202922 1468.813304 1825.604726 2729.174782 413.841467
CA 3424.0 428576.656250 394357.0 394841.75 395047.5 395142.00 753920.0 104654.375661 3424.0 191.437500 ... 2023-11-30 00:00:00 NaN 2262.0 2472.280505 1056.267157 1832.396777 2412.471519 3015.948801 5280.032047 801.072889
CO 1284.0 454493.666667 394405.0 394518.50 394620.5 394908.75 753881.0 133940.119391 1284.0 348.833333 ... 2023-11-30 00:00:00 NaN 716.0 2346.684403 1058.116385 1596.669792 1904.408666 2271.895581 16415.672810 2257.370863
CT 535.0 394815.600000 394415.0 394669.00 394908.0 394924.00 395162.0 254.108902 535.0 119.600000 ... 2023-11-30 00:00:00 NaN 433.0 2127.777496 1231.613286 1599.540968 1837.233176 2604.887072 4416.410828 748.713311
DE 214.0 394795.000000 394539.0 394539.00 394795.0 395051.00 395051.0 256.600235 214.0 189.000000 ... 2023-11-30 00:00:00 NaN 127.0 1571.385754 1258.289127 1340.570246 1469.069826 1812.232209 2085.191980 258.317915
FL 2568.0 424798.708333 394440.0 394759.00 394950.0 395079.25 753906.0 99244.412111 2568.0 174.916667 ... 2023-11-30 00:00:00 NaN 2122.0 1668.835798 885.529271 1273.651277 1522.096523 1936.250181 4340.326552 550.513435
GA 2033.0 413673.210526 394306.0 394423.00 394813.0 395126.00 753893.0 80210.928282 2033.0 326.421053 ... 2023-11-30 00:00:00 NaN 1129.0 1256.099519 624.240075 971.108002 1184.687231 1465.110193 2238.706120 375.740443
HI 321.0 514445.000000 394680.0 394680.00 394731.0 753924.00 753924.0 169601.609159 321.0 183.666667 ... 2023-11-30 00:00:00 NaN 132.0 2705.681385 1941.739389 2376.900621 2535.854344 2903.391210 4534.504002 591.932689
IA 749.0 394623.571429 394325.0 394447.00 394542.0 394707.00 395210.0 263.893527 749.0 320.714286 ... 2023-11-30 00:00:00 NaN 413.0 1343.504147 924.635859 1096.610185 1226.942363 1525.647707 2304.133291 346.240757
ID 749.0 446097.000000 394399.0 394480.00 394988.0 395173.00 753916.0 125750.835880 749.0 363.142857 ... 2023-11-30 00:00:00 NaN 289.0 1576.334430 995.965187 1281.801032 1520.317263 1750.688649 2458.333333 374.149199
IL 1177.0 468306.000000 394454.0 394516.00 394734.0 395114.00 845167.0 157410.509968 1177.0 239.636364 ... 2023-11-30 00:00:00 NaN 629.0 1220.872101 645.511180 1003.992490 1133.110230 1406.751015 2225.798038 331.832737
IN 1391.0 422359.384615 394393.0 394565.00 394705.0 395021.00 753895.0 95740.797170 1391.0 299.538462 ... 2023-11-30 00:00:00 NaN 730.0 1105.728925 643.993545 831.024770 1045.852512 1317.120042 2085.765612 324.831936
KS 749.0 394959.285714 394701.0 394778.00 394981.0 395161.00 395224.0 184.506905 749.0 390.571429 ... 2023-11-30 00:00:00 NaN 423.0 1100.784587 680.241603 869.782224 1063.870894 1245.758531 1927.089567 299.129530
KY 749.0 394736.285714 394406.0 394563.00 394792.0 394949.00 395023.0 203.545232 749.0 267.857143 ... 2023-11-30 00:00:00 NaN 272.0 1201.673804 915.665260 1061.281701 1172.542508 1322.833020 1628.296815 170.093385
LA 1177.0 394735.090909 394314.0 394608.00 394761.0 394910.00 395096.0 236.021094 1177.0 260.545455 ... 2023-11-30 00:00:00 NaN 558.0 1263.411616 897.251299 1101.569579 1279.399917 1393.830133 1839.146198 218.119976
MA 428.0 394779.500000 394361.0 394393.25 394759.5 395145.75 395238.0 400.131638 428.0 88.750000 ... 2023-11-30 00:00:00 NaN 274.0 2246.404623 1471.260619 1762.759314 2264.264095 2598.178490 3533.185405 522.650962
MD 428.0 484350.750000 394358.0 394474.25 394586.5 484463.00 753872.0 155790.307335 428.0 240.000000 ... 2023-11-30 00:00:00 NaN 247.0 1552.644194 880.592353 1210.839158 1589.107929 1754.226876 2184.235363 315.788335
ME 214.0 394678.000000 394359.0 394359.00 394678.0 394997.00 394997.0 319.747949 214.0 196.500000 ... 2023-11-30 00:00:00 NaN 78.0 2134.470922 1547.912828 1834.919379 2041.777850 2482.006422 2775.031672 378.392726
MI 1712.0 417161.437500 394302.0 394580.00 394751.0 394901.25 753890.0 86968.672381 1712.0 247.875000 ... 2023-11-30 00:00:00 NaN 704.0 1444.416460 618.854999 997.830541 1227.899830 1599.409474 3644.308459 684.450686
MN 642.0 394836.000000 394543.0 394637.00 394844.0 395030.00 395118.0 201.659805 642.0 231.166667 ... 2023-11-30 00:00:00 NaN 310.0 1588.607379 1197.459955 1375.809178 1548.545456 1733.734201 2231.628451 247.955226
MO 1498.0 394825.571429 394411.0 394604.00 394743.0 395116.00 395204.0 271.321557 1498.0 396.000000 ... 2023-11-30 00:00:00 NaN 509.0 1122.500158 762.926459 1000.200109 1092.990495 1252.354572 1547.110695 188.219653
MS 642.0 394880.833333 394658.0 394671.00 394831.0 395124.00 395170.0 212.375595 642.0 308.000000 ... 2023-11-30 00:00:00 NaN 225.0 1192.876828 937.793923 1068.174125 1144.629497 1317.754833 2279.597698 195.298729
MT 642.0 394577.833333 394386.0 394407.00 394537.0 394733.00 394867.0 182.675809 642.0 421.666667 ... 2023-11-30 00:00:00 NaN 177.0 1793.224137 1094.019889 1216.021805 1627.453364 2329.398684 3136.530533 591.359944
NC 2354.0 411102.045455 394338.0 394562.00 394697.5 395063.00 753912.0 74823.677625 2354.0 274.500000 ... 2023-11-30 00:00:00 NaN 1286.0 1366.269717 846.562543 1106.111669 1331.843142 1556.471201 2288.942284 324.147384
ND 321.0 394892.666667 394585.0 394585.00 394866.0 395227.00 395227.0 263.183083 321.0 455.666667 ... 2023-11-30 00:00:00 NaN 22.0 1651.132776 1395.000000 1519.067443 1712.143286 1755.955326 1808.444444 132.405607
NE 428.0 394817.750000 394617.0 394751.25 394858.0 394924.50 394938.0 128.299053 428.0 425.000000 ... 2023-11-30 00:00:00 NaN 216.0 1320.009482 940.764327 1186.933246 1262.477897 1451.544329 1754.718239 189.568181
NH 214.0 619992.000000 394820.0 394820.00 619992.0 845164.00 845164.0 225699.953831 214.0 171.000000 ... 2023-11-30 00:00:00 NaN 22.0 2582.894543 2325.537810 2480.164956 2562.927550 2709.069839 2791.183412 139.749875
NJ 214.0 394756.000000 394348.0 394348.00 394756.0 395164.00 395164.0 408.956625 214.0 161.000000 ... 2023-11-30 00:00:00 NaN 214.0 1693.799261 1108.055046 1402.009299 1642.693619 1931.930719 2543.122507 353.021462
NM 963.0 394666.777778 394305.0 394443.00 394588.0 394996.00 395066.0 292.082726 963.0 440.444444 ... 2023-11-30 00:00:00 NaN 314.0 1472.158264 987.763290 1157.028714 1301.372585 1634.597233 2925.870370 453.840507
NV 642.0 394673.666667 394444.0 394584.00 394610.0 394775.00 395019.0 182.289096 642.0 452.833333 ... 2023-11-30 00:00:00 NaN 264.0 1784.668225 1212.685378 1484.060443 1755.211957 2081.924961 2817.571625 370.648528
NY 1070.0 439881.200000 394308.0 394693.00 394972.0 395179.00 845159.0 135156.094569 1070.0 163.300000 ... 2023-11-30 00:00:00 NaN 641.0 1790.942113 997.328276 1343.320969 1602.835366 2157.428456 3509.480232 596.847035
OH 1284.0 432299.833333 394304.0 394472.75 394760.5 395175.25 845158.0 124530.356368 1284.0 141.750000 ... 2023-11-30 00:00:00 NaN 923.0 1046.700170 680.642040 839.625107 987.004712 1188.448704 1947.044776 265.814765
OK 1177.0 394820.000000 394300.0 394548.00 394935.0 395133.00 395169.0 309.974483 1177.0 465.454545 ... 2023-11-30 00:00:00 NaN 410.0 1062.128302 758.667864 973.232481 1038.672842 1176.785601 1458.135575 182.618956
OR 1498.0 471653.571429 394307.0 394505.00 394800.0 395048.00 753923.0 147450.415722 1498.0 379.928571 ... 2023-11-30 00:00:00 NaN 600.0 1781.272373 1052.830183 1532.939293 1757.701867 1985.272415 2611.379315 360.962236
PA 2033.0 394875.421053 394318.0 394666.00 394974.0 395106.00 395244.0 262.799913 2033.0 224.368421 ... 2023-11-30 00:00:00 NaN 875.0 1347.330330 789.815931 1149.120217 1306.885412 1518.000385 2094.027778 277.309547
RI 107.0 395005.000000 395005.0 395005.00 395005.0 395005.00 395005.0 0.000000 107.0 39.000000 ... 2023-11-30 00:00:00 NaN 107.0 2118.407898 1598.141501 1805.755429 2012.306431 2439.407877 2912.998572 389.954589
SC 1070.0 394772.600000 394457.0 394597.00 394667.0 395085.00 395139.0 248.911116 1070.0 225.800000 ... 2023-11-30 00:00:00 NaN 743.0 1396.031749 853.147291 1118.728682 1338.169915 1577.549159 2620.224277 357.345479
SD 321.0 394845.000000 394419.0 394419.00 395013.0 395103.00 395103.0 303.933833 321.0 424.333333 ... 2023-11-30 00:00:00 NaN 106.0 1384.547508 1049.505973 1162.543829 1311.456545 1602.649611 1872.468058 230.479838
TN 1177.0 394705.181818 394460.0 394474.00 394726.0 394849.00 395168.0 211.301360 1177.0 192.636364 ... 2023-11-30 00:00:00 NaN 667.0 1371.017577 935.476460 1104.347806 1363.832948 1532.375111 2270.663942 298.293853
TX 3317.0 409264.806452 394299.0 394502.00 394746.0 395054.00 845162.0 79596.074231 3317.0 259.967742 ... 2023-11-30 00:00:00 NaN 2088.0 1438.496827 765.245138 1152.171443 1396.188630 1668.594537 2474.220726 357.363631
UT 856.0 394902.000000 394446.0 394768.50 394968.5 395069.50 395187.0 232.809618 856.0 320.750000 ... 2023-11-30 00:00:00 NaN 458.0 1711.807021 1109.188006 1408.681922 1591.513180 1930.749340 4311.466765 506.007518
VA 1070.0 394914.300000 394392.0 394668.00 395025.0 395194.00 395232.0 298.071878 1070.0 188.400000 ... 2023-11-30 00:00:00 NaN 851.0 1536.584520 782.641195 1268.882352 1451.721042 1780.951306 2880.797205 441.327399
VT 107.0 394429.000000 394429.0 394429.00 394429.0 394429.00 394429.0 0.000000 107.0 206.000000 ... 2023-11-30 00:00:00 NaN 11.0 2900.941104 2639.431318 2837.553193 2958.561667 2986.743498 3027.025463 125.568309
WA 1605.0 394891.866667 394378.0 394741.00 394925.0 395113.00 395240.0 264.588738 1605.0 303.733333 ... 2023-11-30 00:00:00 NaN 844.0 1852.587674 1025.907447 1514.166912 1812.799384 2157.861092 3104.024269 428.180994
WI 1177.0 394791.727273 394334.0 394646.00 394816.0 394944.00 395215.0 224.222302 1177.0 247.727273 ... 2023-11-30 00:00:00 NaN 323.0 1514.551976 907.936037 1130.945896 1419.831933 1827.497187 2671.202919 434.493367
WV 321.0 394601.000000 394455.0 394455.00 394469.0 394879.00 394879.0 196.965797 321.0 306.666667 ... 2023-11-30 00:00:00 NaN 123.0 1340.545489 905.843432 1095.441407 1383.271660 1485.772036 1751.850634 245.916508
WY 321.0 394512.666667 394445.0 394445.00 394462.0 394631.00 394631.0 84.092719 321.0 458.666667 ... 2023-11-30 00:00:00 NaN 92.0 1386.686876 1182.758469 1252.376297 1359.626203 1525.037930 1678.124614 157.596041

50 rows × 32 columns

# Provide the mean for each numeric column by sex
grouped_df.mean(numeric_only = True)
RegionID SizeRank avg_price
StateName
AK 394453.500000 273.000000 1845.229913
AL 448643.533333 296.066667 1196.697249
AR 394790.428571 285.142857 1095.547769
AZ 451262.875000 214.375000 1557.286231
CA 428576.656250 191.437500 2472.280505
CO 454493.666667 348.833333 2346.684403
CT 394815.600000 119.600000 2127.777496
DE 394795.000000 189.000000 1571.385754
FL 424798.708333 174.916667 1668.835798
GA 413673.210526 326.421053 1256.099519
HI 514445.000000 183.666667 2705.681385
IA 394623.571429 320.714286 1343.504147
ID 446097.000000 363.142857 1576.334430
IL 468306.000000 239.636364 1220.872101
IN 422359.384615 299.538462 1105.728925
KS 394959.285714 390.571429 1100.784587
KY 394736.285714 267.857143 1201.673804
LA 394735.090909 260.545455 1263.411616
MA 394779.500000 88.750000 2246.404623
MD 484350.750000 240.000000 1552.644194
ME 394678.000000 196.500000 2134.470922
MI 417161.437500 247.875000 1444.416460
MN 394836.000000 231.166667 1588.607379
MO 394825.571429 396.000000 1122.500158
MS 394880.833333 308.000000 1192.876828
MT 394577.833333 421.666667 1793.224137
NC 411102.045455 274.500000 1366.269717
ND 394892.666667 455.666667 1651.132776
NE 394817.750000 425.000000 1320.009482
NH 619992.000000 171.000000 2582.894543
NJ 394756.000000 161.000000 1693.799261
NM 394666.777778 440.444444 1472.158264
NV 394673.666667 452.833333 1784.668225
NY 439881.200000 163.300000 1790.942113
OH 432299.833333 141.750000 1046.700170
OK 394820.000000 465.454545 1062.128302
OR 471653.571429 379.928571 1781.272373
PA 394875.421053 224.368421 1347.330330
RI 395005.000000 39.000000 2118.407898
SC 394772.600000 225.800000 1396.031749
SD 394845.000000 424.333333 1384.547508
TN 394705.181818 192.636364 1371.017577
TX 409264.806452 259.967742 1438.496827
UT 394902.000000 320.750000 1711.807021
VA 394914.300000 188.400000 1536.584520
VT 394429.000000 206.000000 2900.941104
WA 394891.866667 303.733333 1852.587674
WI 394791.727273 247.727273 1514.551976
WV 394601.000000 306.666667 1340.545489
WY 394512.666667 458.666667 1386.686876

More wrangling soon!