PSDA - Using Statistics to Determine Shape, Center, and Spread Lesson
Using Statistics to Determine Shape, Center, and Spread
Statistics study data in different points of view. In the last module, two way tables were used to organize data. Another method of organization includes a visual look at the data. A visual look at the data provides a different perspective, a shape point of view.
The following vocabulary is important in order to discuss the statistics as a graphical picture.
Bell-Shaped - A symmetric distribution of data gently rising to a maximum at the mean (the middle) and then falling back again in the same pattern (mirror-image).
Mean - The average of all of the data values. Add up all the data values and divide by the number of values.
Median - This is the center of the data. Just as the median in the middle of a multi-lane highway divides the road in half, the median of the data divides the data in half.
- Organize the values in low to high order, count the number of values, divide by two and count from each end toward the middle.
- If the number of values is odd, an exact number will be located.
If the number of scores is even, two scores will be located. Add the scores and divide by 2. The median is not an exact value of the data items.
Mode - The data item that occurs the most (must be > 1) often. There may be more than one mode if two or more data items occur equally the most.
Outlier - A data element that is clearly distant from the other data items. Either far left or right, or way above or below the data being described.
Range - The first data element that would be plotted on the left to the right most data element. The largest data value minus the smallest data value. The range describes the spread of the data.
Skewed - A representation of data with clearly much more data on one side of the graphical picture than the other. One side of the data thins out and trails off. This side is called the tail. The tail side is the direction that the data is skewed.
- Skewed right means that more data is on the left side of the graph representing the data. The tail is on the right side.
- Skewed left means that more data is on the right side of the graph representing the data. The tail is on the left side.
Spread - The distribution of the data, the range--How far from one end of the visual graph of data to the other.
- If the data covers a wide length, then the spread is bigger or larger.
- If the data covers only a small length, then the spread is smaller.
Symmetry - The shape of the data matches on both sides of the median, the two sides when divided at the median (in half) are mirror images of each other.
An outlier can skew information to one side or the other. Obviously the range changes, but look what happens to the mean with an outlier in the data.
The new mean calculation with the outlier is:
(30 +56 + 89 + 73 + 64 + 77 + 85 + 52 + 88 + 85 + 90)/11= 789/11 = 71.72
In the question above, the mean clearly shows that analysis of the data is pulled to the left by the outlier. Sometimes an outlier or two are stated as such, and then analysis is done with an without the outlier for a clearer picture of the data trend caused by the outlier(s). Even the median is pulled to the left by this outlier. The range has just become very large, it went from being 38 to 60.
Introduction to Bell Curve
What is meant by bell-shaped? This is simply a symmetric graph that looks like a bell if a handle was placed in the middle.
When data is shown in a bar graph, or other method that resembles this curve, we call it bell-shaped or a normal distribution curve. We call the curve normal because a symmetric around the mean and looks like the upside down parabola. If we throw a football up and down the field, the arc made reaches a maximum and then the ball gradually comes back down using this parabolic bell shape.
Note that the middle, mean and median, is the 50, half of (100 - 0)/2, the difference between the larger number and the smaller number divided by 2. If we organized all 100 of the numbers, there would be 50 numbers above 50 and 50 numbers below 50, thus 50 is the median of this graph.
Skew means that symmetry has been destroyed. The Unknown Vertical Axis Graph below is definitely not a bell curve. This graph's data, whatever it is, is skewed to the right. More of the data is grouped on the left side of the graph than the right. It does not matter what the vertical axis is, the data has a rightward trend because the tail of the graph is on the right, pulling the data rightward for an analysis of the mean. The mean is affected by the extreme low numbers on the right. So remember, the tail pulls (skews) the representation of the data. If the tail is on the right, the data is skewed right. If the tail is on the left, the data is skewed left.
Unknown Vertical Axis Graph (What if?)
What could have made the data skew right? We would need to know more information about the graph.
Let's surmise a story for the graph. Graphs always tell a story. This graph's story is unknown because the vertical axis is not present. However, we can use our imagination and create a story for the graph.
- Maybe this graph illustrates the number of people getting on and filling an amusement park ride in minutes, followed by the amount of time it took to get the people off the ride when the ride closed due to a storm? The initial time until the graph starts is the preparation time.
- Maybe the graph illustrates the number of people entering an arena late for a basketball game in progress. The number of people peaks and then recedes approximately 7 minutes into the game.
- Maybe the graph illustrates the rise of water during a flood, with the water table peaking after 7.5 hours and then the water receding.
Whatever the true meaning of the graph is we do not know, but we do know that all graphs tell a story, because graphs are made up of data. As shown in the three sample ideas for this graph, we can create a story for the graph using a realistic description of the data.
Introduction to Bar Graphs
The Tickets Sold by Cost graph, a bar graph, provides more information. The graph is labeled. Quickly we see that this graph is talking about tickets versus cost and has a label indicating that the blue bars tell how many.
Let's check our knowledge with the bar graph.
IMAGES CREATED BY GAVS OR OPENSOURCE