# Understand representative samples | Be US Census Bureau’s Chief Analyst (Lesson 2 of 5) | 6-8

### Student Objective

Students will be able to:
1. identify the conditions required to make a data sample a viable research strategy to credibly make inferences about a larger population

### Instructions

Materials Required:

Key Terms that will be Used:

• US Census Bureau
• American Community Survey (ACS)
• Household Income
• Food Stamp Recipients
• Codebook
• Variables
• Household Unit
• Numerical and Categorical Data
• Population
• Observations (new)

### Step 1:  Students Brainstorm the best way to “sample” a ton of meat

Using a digital board, have students provide at least 2 different ways to “sample” a create ton of burger meat

• Say:
• Say you work for the Food and Drug Administration in the government, and you are in charge of making sure that the meat everyone in the country eats is clean and safe!
• But, you can’t test EVERY single crate and piece of meat, so what do you think you should do?
• Write out at least 2 different ways to check a ton crate of meat without testing the entire crate and why you would be able to tell if the meat is safe or not.
• Think-Pair-Share:
• Have 2-3 students share what their partner thought would be a good idea and why.
• Push their thinking:
• What if you had 100 crates?
• What about the meat at the bottom of the crate?
• Do you sample every crate in the same spot?
• Stamp:
• So, we can test a piece of the meat in the crate.
• And it’s unclear how to determine where we grab that piece of meat from.
• Introduce Today:
• So today, we’re going to learn how researchers and analysts solve this problem.

### Step 2: Connect Meat Sampling with Sampling a Data

Say and do:

• Introduce and define the idea of sampling from a population
• Definitions:
• Define sampling -> fraction or percentage of a group
• Define population -> a whole set of an entire group, includes EVERY member in the group
• Define observations -> the unit of data under observation
• a sample is made up of some of the observations possible
• a population is made up of all the observations possible
• “a row” within our data
• “Analysts and researchers usually can’t test an entire population, so they test a sample of the population that represents the population. This is called a “representative sample!  So how do we make a representative sample?”

### Step 3: Introduction to New Material – Introduce the Enduring Question and Goal of Activity

• Under what conditions does a sample represent an entire population?

### Step 4: Set the stage for students to work within Google Sheets with Filtering and Pivot Tables

[Teacher To Do’s Beforehand:]

• Make a copy of “NewYorkDataOnly_TeacherOnly” for self reference
• Make a copy of “FiveStatesOfData_TeacherOnly”
• Make a copy of “NewYorkDataOnly_Students”
• Each student will make a copy of your copy and label it, “StudentFullName_NewYorkOnly_Students” and share it with you
• Make a copy of “FiveStatesOfData_Students”
• Each student will make a copy of your copy and label it, “StudentFullName_FiveStatesOfData_Students” and share it with you
• Make a copy of “FakeData_Students”
• each student will make a copy of your copy and label it, “StudentFullName_FakeData_Students” (does not need to be shared)

Set students up for success through Mini-Lessons:

• How to filter data in Google Sheets using “FakeData_Students”
• Click in cell A1 in Google Sheet with data
• Move cursor to Toolbar –> Click Funnel Icon [look to the right for funnel icon]
• Click on upside down green triangle for a variable and select only the entries you want
• EX: I want only New York in 2010
• YEAR –> 2010
• StateName –> New York
• Model for students, then allow them to do it with practice:
• “Filter for Wisconsin, 2010”
• How to calculate statistics (averages) in Google Sheets using “FakeData_Students”
• Model for students how to filter and have them do it with you:
• Have students filter data to get only: New York in 2019
• Have them copy/paste that data into a new tab, “NewYork2019”
• [Within “NewYork2019” Tab] Select all the data
• Toolbar –> Data –> Pivot Table –> New Sheet –> Create
• Select Rows and Columns
• [In Pivot Table Editor] Rows –> Add –> “StateName” (New York)
• De-Select “Show total”
• [In Pivot Table Editor] Values –> Add –> HHINCOME
• Click “Summarize by”
• Select “Average”
• Have students practice
• Find average household income for Nebraska in 2010
• Find average number of people in the household within Montana in 2019
• Have discussion on what the “AVERAGE” means for the variables “FOODSTMP” and “OWNERSHP”
• Answer: average of 1s and 0s is the percent value (show them the math)
• Practice:
• Percentage of households that received Food Stamps in Montana in 2019
• Percentage of households that owned their home in Nebraska in 2010

### Step 5:  Have students calculate sample statistics

• Share with students 2010 New York Data in a separate google sheet
• Have students make their own individual copy of the data and have them share it with you (the teacher)

NOTE: In this step, students should be calculating these statistics using Google Sheets Pivot Tables

• Create google slides with one slide allocated to each student, and each student has a table they will fill in for this activity
 Sample Types Variable Population First 20 100 block 10% Random HHINCOME OWNERSHP NumInHouse FOODSTMP ROOMS PersRm

### Step 6: Have students calculate the population statistics

Students calculate the averages of each criteria variable for the Population, using all rows from the 2010 New York dataset

• Steps for students: (independent practice)
• select all data
• insert pivot
• calculate numbers
• Fill in table with calculated values
• Actual Population Data Calculations:
• Average Number of People in Household –> 2.49
• Percentage of Households Own Home –> 62.09%
• Average Household Income–> 80,186.07
• Percentage of Households Receiving Food Stamps –> 12.46%
• Average Number of Rooms –> 5.76
• Average Persons per Room –> 0.4

### Step 7: Have students calculate statistics using a sample of the first 20 rows (or observations) of the data

Students calculate the averages of each criteria variable, using only the first 20 observations (or rows, or people) of the 2010 New York data

• Steps for students: (independent practice)
• Select first 20 observations
• copy/paste into new tab
• select data in new tab
• insert pivot
• calculate numbers
• Fill in table with calculated values
• Actual Data Results
• Average Number of People in Household –> 2.9
• Percentage of Households Own Home –> 50%
• Average Household Income–> \$156, 805.00
• Percentage of Households Receiving Food Stamps –> 20%
• Average Number of Rooms –> 6.55
• Average Persons per Room –> 0.47

### Step 8: Have students calculate statistics using a sample of any 100 contiguous rows (or observations) of the data

Students calculate the averages of each criteria variable, using any contiguous 100 rows of the 2010 New York data

• Steps for students: (independent practice)
• Select 100 observations as one big block
• copy/paste into tab
• select data in tab
• insert pivot
• calculate numbers
• Fill in table with calculated values
• Actual Data Results
• Average Number of People in Household –> [will vary]
• Percentage of Households Own Home –> [will vary]
• Average Household Income–> [will vary]
• Percentage of Households Receiving Food Stamps –> [will vary]
• Average Number of Rooms –> [will vary]
• Average Persons per Room –> [will vary]

### Step 9: Have students calculate statistics using a 10% Random Sample of the data

Share and inform students:

• Define 10% Random Sample:
• 10% –> 10% of the population (one tenth)
• Random –> selections that have the same probability of being selected (think of lottery balls, each ball has an equal chance of being chosen)

Students calculate the averages of each criteria variable, using the 10%RandomSample data (pre-prepared for students in NewYorkDataOnly_Students

• Actual Values
• Average Number of People in Household –> 2.42
• Percentage of Households Own Home –> 61.08
• Average Household Income–> \$81, 432
• Percentage of Households Receiving Food Stamps –> 12.03%
• Average Number of Rooms –> 5.71
• Average Persons per Room –> 2.42

### Step 10: Think-Pair-Share — Students Compare Each Sample’s Statistics with the Population Statistics

• Think – 4 minutes
• Using your filled out table of statistics by sample, which sample best represents the population?
• How do you know?
• Why not the other samples?
• Pair – 2 minutes
• Share with your partner what sample you selected as representative and why.
• Share – 2 minutes
• Select 2-3 students to share out
• Exemplar Response:
• The sample that best represents the population is the 10% Random Sample because the statistics are almost the same.  The other samples’ statistics might have one or two variables that are close but the others are very different.”
• PUSH THINKING
• What are the 2 conditions required for a sample to be representative of the population?
• Hint: look at the title of the best sample we chose
• The sample must be:
• random
• 10% of the population

### Step 11: Stamp and End Lesson, Preview Next Lesson:

• Stamp
• Today we asked the question: “Under what conditions does a sample represent an entire population?”
• “A sample is representative when it is 10% of the population and is random.”
• Preview Next Lesson:
• Next lesson, you will actually conduct a 10% random sample so you can run your analysis for your selected state as the US Census Bureau Chief Analyst!”

• Each student should have 3 slides assigned to them

Slide 1:  Variable Names and Descriptions

 What are the variables available in the American Community Survey of 5 states? Variable Name Descriptions  (in your own words) Data Type (numerical or categorical)

Slide 2:  Students select the variables they want to use for analysis

 Select 3 or 4 variables for your research and analysis Variable Reason Selected What were the variables you chose NOT to use for analysis? Why did you not choose the other available variables?

Slide 3: Variables Chosen and Values by Time Period

 What state will you write a report on? ________________ Variables Time Period 1: Beginning Year: _________ Time Period 2 End Year: _________ 1. 2. 3. 4.