# Conduct a representative sample | Be US Census Bureau’s Chief Analyst (Lesson 3 of 5) | 6-8

### Student Objective

Students will be able to:

1. Conduct a representative, random sample and compare it to the population, using a simulation in google sheets.

### Instructions

**Materials Needed:**

- Dataset to sample from for practice, “FakeData_Students“
- Dataset to sample from for practice, “NewYorkDataOnly_Students“
- Google sheets (Google Drive account per student)
- Random lottery

**Step 1: Own It Activity**

- Think-Pair-Share: Ask students, “How would you conduct a representative sample of 100 one-ton-crates of meat?”

**Step 2: Introduce Enduring Question**

- Last lesson, we asked ourselves, “What conditions are required for a sample to representative of its population?”
- Today, we ask, “How do we conduct a 10% random sample from a population using Google Sheets?

**Step 3: Model How to Randomly Select Observations in Dataset**

- Model for students, as they do it with you:
- Open up, “FakeData_Students”
- In the tab, “Sheet1,” click the cell under the words, “Random Students” in column K
- Check to make sure you can see the following google code in the cell:
- =ArrayFormula(Array_Constrain(vlookup(Query({ROW(A1:A101),randbetween(row(A1:A101)^0,9^9)},“Select Col1 order by Col2 Asc”),{row(A1:A101),A1:A101},2,FALSE),100*0.1,1))

- Double click into the cell, highlight the entire google code, and copy it by pressing “CTRL+C”
- Click out of the cell
- Click back on the cell
- Press “CTRL+V” to paste
- The numbers should now have changed — these are the SERIAL numbers of households in our 10% random sample
- we have just run a random lottery, where we grabbed 10% of lottery balls, but in this case they are households

- Check to make sure you can see the following google code in the cell:
- Explain to students:
- Each time you press “CTRL + C” into that cell, you re-run the randomization (or the lottery)
- “You’ve now conducted a 10% random sample, but you only have the SERIAL identifier of each household. We have to make sure we get the household data that comes with the households themselves.”

- Have students:
- Look at tab, “10%RandomSample” and you’ll see that the SERIAL numbers sample are connected to the household data!”
- Test and try out:
- have students run a new sample
- check the “10%RandomSample” tab to see if the SERIAL numbers match with their data

**Step 4: Have Students Conduct a Random Sample **

- Have students open up, “NewYorkDataOnly_Students”
- Have students conduct a random sample on their NewYorkDataOnly_Students on their own

NOTE:

- The sample will take a 5-7 minutes to finish and calculate.
- Use the extra time to teach students how the function works for conducting a random sample
- =ArrayFormula(Array_Constrain(vlookup(Query({
**ROW(A1:A101)**,randbetween(row(A1:A101)^0,9^9)},“Select Col1 order by Col2 Asc”),{row(A1:A101),A1:A101},2,FALSE),**100*0.1,1**))- Sections of function:
- ROW(A1:A101) –> A1:A101 means the cells where my SERIAL numbers are
- 100*0.1 –> a calculation for the number of observations I want to sample. 100 = all observations in data; 0.1 = 10%; 100*0.1 = 10% of 100 observations (total of 10)

- Sections of function:

- =ArrayFormula(Array_Constrain(vlookup(Query({

**Step 5: Have Students Calculate Statistics from Their Random Sample and Compare to Population Statistics**

- Using Google Sheets Pivot Tables, have students calculate the average value of each variable for:
- their random sample
- their total NY 2010 population

- Think-Pair-Share:
- Have students compare their population statistics with their sample statistics
- Are they the same?
- Are they different? How different?
- Can you use this sample to represent the population?

- Have students compare their population statistics with their sample statistics

**Step 6: Stamp and End Lesson, Introduce Next Topic**

- In this lesson, we asked ourselves, “How do we conduct a representative 10% random sample?”
- Tomorrow, we will learn how to make statistical inferences from our sample statistics when we compare them to other population statistics from a different sample of the same information/