RVCND - Introduction to the Continuous Normal Distribution Lesson
Introduction to the Continuous Normal Distribution Lesson
Sometimes the overall pattern of a very large number of observations is SO REGULAR that we can describe it by a smooth curve. This curve is a mathematical model for the distribution or rather an "idealized" description. It gives a quick picture of the overall pattern but ignores minor irregularities as well as outliers. It is easier to work with a smooth curve than with a histogram because the histogram depends on the choice of classes, while the curve does not depend on any choices made by us .
A NORMAL curve is one that mimics a symmetric histogram where the mean and median are EQUAL to each other. Graphically, the MEDIAN of a density curve is the equal-area point, the point with half the area under the curve to its left and the other half to its right. The MEAN is the point at which the curve would BALANCE if made of solid material.
New notation is necessary since a density curve is an idealized description of the data (not the actual data), we will distinguish between the mean and standard deviation of the curve and the mean ( ) and standard deviation (s) from the actual observations. The notation for the mean of this idealized distribution is µ (Greek mu small "m") and the standard deviation is σ (Greek sigma small "s").
Normal distributions emerge after many, many repeated trials. Normal distributions are symmetric, single-peaked, and bell-shaped. They are called normal curves. All normal distributions have the same overall shape. The exact density curve for a particular normal distribution is described by giving its mean and standard deviation and is designated as N (µ, σ) . The mean is located at the center of the symmetric curve and is the same as the median. Changing µ without changing σ moves the normal curve along the horizontal axis WITHOUT changing its spread. The standard deviation controls the spread. The curve with the larger standard deviation is more spread out. A density curve has special "inflection points" which are the points at which the curvature "changes." These points are located on both sides of the mean at a distance of + 1σ .
Normal distributions are important because they are good descriptions for many real life data sets. Distributions for large numbers of SAT scores and psychological tests results in a typical normal curve. Also, normal curves can describe many types of chance outcomes like rolling a die or tossing a coin. These curves are the basis for statistical inference procedures which allow us to INFER conclusions from our data analysis using the laws of probability and more.
EMPIRICAL RULE...Defining the Normal Curve
In a Normal distribution with mean µ and standard deviation σ it must be true that:
68% of the observations fall within one standard deviation of the mean
95% of the observations fall within two standard deviations of the mean
99.7% of the observations fall within 3 standard deviations of the mean
Remember, describing any normal distribution can be done with a shortcut notation, N(µ, σ).
All normal distributions share common properties and are the same if we measure in units of σ about the mean µ as center. Changing to these σ units is called STANDARDIZING. To standardize a value, subtract the mean from the value and divide by the standard deviation. If x is an observation from a distribution that has mean µ and standard deviation σ the standardized value of x is called a z-score and is calculated by
NOTE: As with all formulas, the z-formula can be manipulated to find any one of the four variables when given the values of the other three.
A "z-score" tells how many standard deviations the original observation falls away from the mean, and in which direction. Observations larger than the mean are positive when standardized, while observations smaller are negative. Standardizing a variable that has any normal distribution produces a new variable that has the standard normal distribution. The standard normal distribution is the normal distribution with mean 0 and standard deviation 1.
An AREA under a DENSITY CURVE is a proportion of the observations in a distribution. Any question about what proportion of observations lie in some range of values can be answered by finding the area under the curve. We can find these areas from a table (the z-table) that gives areas under the curve for the standard normal distribution. Typically, these tables give the area to the LEFT of a specific z-value, but you should always check the legend when reading any table. The complement rule for probability tells us that we can find the area to the RIGHT of a value by subtracting the table value from 1 since we know that the total area under the density curve is ALWAYS = 1. Sometimes we may be asked to find the observed value with a given proportion of the observations above or below it, meaning we must use the table backwards.
When faced with a normal distribution problem it is a good idea to follow this sequence of steps.
1) STATE the problem in terms of the observed variable x.
2) DRAW a picture of the distribution and shade the area of interest under the curve.
3) STANDARDIZE "x" to restate the problem in terms of a standard normal variable z using the formula.
4) FIND the required area under the standard normal curve using the z-table or calculator.
5) WRITE your conclusion in the context of the problem.
Interestingly, combining two or more normal random variables follows the same rules presented for discrete random variables earlier and the combined variable is itself normal.
E(aX) = (a)E(X)
Var(a(X)) = a2 Var(X)
E(X + Y) = E(X) + E(Y)
Var(X + Y) = Var(X) + Var(Y)
IMAGES CREATED BY GAVS