SR - Statistical Investigations Lesson
Statistical Investigations
Adapted from Course materials (III.A Student Activity Sheet 1, 4, & 5) for AMDM developed under the leadership of the Charles A. Dana Center, in collaboration with the Texas Association of Supervisors of Mathematics and with funding from Greater Texas Foundation.
Statistical investigations are used every day for a variety of reasons. Statistical studies use a combination of mathematical and scientific techniques to answer questions about human behavior, the effectiveness of medical treatments, the safety and reliability of machinery and products, and so on.
Consider the following examples of two different types of statistical investigations.
Observational Study
An observational study is research in which data is collected about some characteristic(s) of the population. The data can be collected by observation or by a survey or interview or by other means. The following approaches are examples of an observational study:
Radio rating services sometimes collect data on listenership by asking participants to record the date, time, and station each time they listen to the radio. Other rating services distribute monitoring devices that automatically record this information anytime the participant has the radio turned on. Still, others call participants and ask them about their listening habits. The data are then compiled so that advertisers know which stations are the most popular at specific times during the day.
Experimental Study
In an experimental study, the researcher separates the participants into one or more groups and applies some sort of treatment. After treatment, the variable of interest is measured and the results are compared. A group of participants that the treatment group is being compared to is called the control group. Researchers are often concerned that participants in a study show improvement simply because they are in the study and not because they are receiving an effective treatment. This is called the placebo effect.
A 17-year-old student designed a science fair project with 72 mice randomly assigned to three groups: hard rock music, Mozart, and no music at all (called a control group). The mice in the first two groups were exposed to music 10 hours a day. Three times a week, all of the groups were timed as they ran through a maze. An analysis of results showed that the 24 mice in the no-music group averaged about a 5-minute improvement in their maze completion time, while the Mozart mice improved 8.5 minutes. The hard rock mice actually got slower—an average of four times slower! Another interesting fact: The student had to start his experiment over because all the hard-rock mice killed each other. None of the classical mice did that. Wertz, M. 1998.
What are the treatment and the variable of interest in this case? The treatment is the type of music. The variable is interest is the maze running time.
Answering Questions Leads to Experimentation
Consider this situation: "This unopened bag of chips is half empty. I wonder if it really contains 28.3 grams as the package says?"
This type of informal question or observation is the beginning of many investigations. Informal questions can turn into more formal problem statements or research questions.
For example, you may decide to investigate whether there is a scandal in the potato chip industry by checking the following:
Suppose you conduct the investigation into Spud Potato Chips and find that the mean weight of the chips in your sample is 25 grams, rather than 28.3 grams ( x = 25 grams). Do you think that a difference of 3.3 grams between the actual and advertised weights is large enough that it needs to be reported? If so, how do you report this information and to whom?
- In some situations, researchers are even more formal and state hypotheses. In a case like this, the null hypothesis (Ho) generally states that there is no difference between the true value and the claimed value.
- The alternative hypothesis (Ha) states that something is different or incorrect, or that something has changed.
What are the null and alternative hypotheses for the potato chip example?
Ho - The true mean weight of bags of Spud Potato Chips 28.3 grams or greater.
Ha - The true mean weight of bags of Spud potato chips is less than 28.3 grams.
Notice that the hypotheses say "The true mean weight." This implies that the statements refer to the population of all Spud Potato Chip bags, not just a single bag or even a small sample. When a statistical investigation is conducted, it generally employs a sample that is then used to make a generalization about the population. Notice that in this case (as in many cases), the population does not refer to people, but to bags of potato chips.
When researchers select and weigh a sample, they know the sample mean, but they plan to generalize to the population mean. When a numerical representation of a population is computed, it is generally called a population parameter. When a numerical representation of a sample is computed, it is called a sample statistic.
To be concise, researchers often use symbols in place of words. Greek letters are usually used when referring to populations (the entire group being studied, from which a sample or samples will be drawn). English letters are used for samples (the particular items or individuals included in a particular study). For example, when discussing the mean:
- μ = the population mean (Greek letter mu—pronounced mew)
= the sample mean (pronounced x -bar)
So the hypotheses for a study can be stated in words or symbols. When using symbols, you must identify what your symbols represent.
Ho: μ ≥ 28.3 grams, where μ is the true mean weight of a bag of Spud Potato Chips
Ha: μ < 28.3 grams
Statistical Studies
Statistical studies are designed with carefully selected measures that ensure (within error margins) that, if the sample is well selected and the study is well designed and conducted, the mean and other measures of the sample are likely to be similar to the corresponding measures of the population being studied. Sometimes, if the population is small (such as high school seniors in a small town), it may be possible that the sample studied is the entire population. However, often a sample is a smaller subset of a population (such as a research question that might target the entire population of high school seniors in a state or in the nation).
With a blind study, participants do not know whether they are receiving the treatment or the placebo. This is crucial for trying to control the placebo effect. Using a placebo is worthless if the participants know they are getting it.
There is often a concern that if researchers know who is getting the placebo, it may affect their ability to fairly assess a treatment. For example, researchers may subconsciously check for improvement more carefully. A physical therapist may push the patient who is receiving the treatment to work harder, while not encouraging the control group patient as much. In a double-blind study, only a third person not involved in the assessment knows the participants who are receiving the treatment and those who are in the control group. After all data have been collected, this person then identifies each patient file as treatment or control.
Recall the Spud Potato Chips scenario from earlier. You hypothesized that the true mean weight of bags of Spud's might be less than the 28.3 grams advertised on the bags. Discuss and make some notes on how you might collect a sample of bags to test your hypothesis. Remember that the sample should be representative of the population.
What do you mean by "the population of Spud Potato Chips" that you are interested in testing? The population of interest is all bags of potato chips, not just those being sold at your school.
If we only test the bags at the school, it is likely that all of those bags came from the same shipment. Perhaps that shipment was undersized for some reason (for example, faulty equipment that has since been fixed). We should at least select bags from various places around town. The best possible scenario is testing bags from around the country.
Random Sampling
Random sampling means that an unbiased strategy was used to select the participants in the statistical investigation. Random assignment of treatments means that a strategy was used to determine who gets an active treatment and who gets a placebo or the order in which treatments are given. In both situations, we can use a random number table or generator to select participants. There are several different types of random number generators including your graphing calculator.
Self-Assessment: Billboard Hot 100
Go to www.billboard.com Links to an external site. and select the Hot 100. This list gives the top 100 songs for the week. You will compute statistics on the number of weeks that a random selection of the songs has been on the charts. (If you wish, choose one of the specialized charts such as R&B/Hip-Hop or Country.)
- Use your calculator or a random number table to select 10 of the 100 songs.
- Write down the name of each song, its rank on the list, and the number of weeks it has been on the chart. For example, if your random number generator gives you 3, write "No. 3, Mary Had a Little Lamb ̧14 weeks."
- Calculate the descriptive statistics for the data set and interpret all statistics.
- Use a different tool to select 10 of the 100 songs.
- Write down the name of each song, its rank on the list, and the number of weeks it has been on the chart. For example, if your random number generator gives you 3, write "No. 3, Mary Had a Little Lamb ̧14 weeks."
- Calculate the descriptive statistics for the data set and interpret all statistics.
What did you discover? What's the average number of weeks that a random selection of the songs has been on the charts.
IMAGES CREATED BY GAVS