Conduct a representative sample | Be US Census Bureau’s Chief Analyst (Lesson 3 of 5) | 6-8
Student Objective
Students will be able to:
1. Conduct a representative, random sample and compare it to the population, using a simulation in google sheets.
Instructions
Materials Needed:
- Dataset to sample from for practice, “FakeData_Students“
- Dataset to sample from for practice, “NewYorkDataOnly_Students“
- Google sheets (Google Drive account per student)
- Random lottery
Step 1: Own It Activity
- Think-Pair-Share: Ask students, “How would you conduct a representative sample of 100 one-ton-crates of meat?”
Step 2: Introduce Enduring Question
- Last lesson, we asked ourselves, “What conditions are required for a sample to representative of its population?”
- Today, we ask, “How do we conduct a 10% random sample from a population using Google Sheets?
Step 3: Model How to Randomly Select Observations in Dataset
- Model for students, as they do it with you:
- Open up, “FakeData_Students”
- In the tab, “Sheet1,” click the cell under the words, “Random Students” in column K
- Check to make sure you can see the following google code in the cell:
- =ArrayFormula(Array_Constrain(vlookup(Query({ROW(A1:A101),randbetween(row(A1:A101)^0,9^9)},“Select Col1 order by Col2 Asc”),{row(A1:A101),A1:A101},2,FALSE),100*0.1,1))
- Double click into the cell, highlight the entire google code, and copy it by pressing “CTRL+C”
- Click out of the cell
- Click back on the cell
- Press “CTRL+V” to paste
- The numbers should now have changed — these are the SERIAL numbers of households in our 10% random sample
- we have just run a random lottery, where we grabbed 10% of lottery balls, but in this case they are households
- Check to make sure you can see the following google code in the cell:
- Explain to students:
- Each time you press “CTRL + C” into that cell, you re-run the randomization (or the lottery)
- “You’ve now conducted a 10% random sample, but you only have the SERIAL identifier of each household. We have to make sure we get the household data that comes with the households themselves.”
- Have students:
- Look at tab, “10%RandomSample” and you’ll see that the SERIAL numbers sample are connected to the household data!”
- Test and try out:
- have students run a new sample
- check the “10%RandomSample” tab to see if the SERIAL numbers match with their data
Step 4: Have Students Conduct a Random Sample
- Have students open up, “NewYorkDataOnly_Students”
- Have students conduct a random sample on their NewYorkDataOnly_Students on their own
NOTE:
- The sample will take a 5-7 minutes to finish and calculate.
- Use the extra time to teach students how the function works for conducting a random sample
- =ArrayFormula(Array_Constrain(vlookup(Query({ROW(A1:A101),randbetween(row(A1:A101)^0,9^9)},“Select Col1 order by Col2 Asc”),{row(A1:A101),A1:A101},2,FALSE),100*0.1,1))
- Sections of function:
- ROW(A1:A101) –> A1:A101 means the cells where my SERIAL numbers are
- 100*0.1 –> a calculation for the number of observations I want to sample. 100 = all observations in data; 0.1 = 10%; 100*0.1 = 10% of 100 observations (total of 10)
- Sections of function:
- =ArrayFormula(Array_Constrain(vlookup(Query({ROW(A1:A101),randbetween(row(A1:A101)^0,9^9)},“Select Col1 order by Col2 Asc”),{row(A1:A101),A1:A101},2,FALSE),100*0.1,1))
Step 5: Have Students Calculate Statistics from Their Random Sample and Compare to Population Statistics
- Using Google Sheets Pivot Tables, have students calculate the average value of each variable for:
- their random sample
- their total NY 2010 population
- Think-Pair-Share:
- Have students compare their population statistics with their sample statistics
- Are they the same?
- Are they different? How different?
- Can you use this sample to represent the population?
- Have students compare their population statistics with their sample statistics
Step 6: Stamp and End Lesson, Introduce Next Topic
- In this lesson, we asked ourselves, “How do we conduct a representative 10% random sample?”
- Tomorrow, we will learn how to make statistical inferences from our sample statistics when we compare them to other population statistics from a different sample of the same information/