The $t$-test will tell you when you may conclude that: \[ \mu \hspace{.25in}= \hspace{.25in} x_{0} \hspace{.25in} = \hspace{.25in} \bar{x} \]
\begin{tikzpicture} \begin{picture}(50,50)(-56,-60) \put(102,-20){\vector(0,0){15}} \put(160,-20){\vector(0,0){15}} \put(217,-20){\vector(0,0){15}} \put(95,-30){pop.} \put(95,-40){mean} \put(140,-30){A priori} \put(150,-40){guess} \put(143,-50){about $\mu$} \put(210,-30){sample} \put(210,-40){mean} \end{picture} \end{tikzpicture}
Here the population could be the height of 10 year old children in Saskatchewan. The quantity $\mu$ is the actual average height of 10 year old kids in Saskatchewan. You could, in principle, measure all the 10 year olds in Saskatchewan but, in practice you can't. Even if you spent the time finding them all and measuring their heights with a tape measure, they will be growing while you measure them all. It's generally impossible to measure a population in practice for some reason. Practically, we can only measure a small sample of children from the population. That sample will have a mean that we denote with $\overline{x}$. The $t$-test is a hypothesis test in which we compare the sample mean $\overline{x}$ to a hypothetical mean $x_{0}$ and conclude with a probabilistic inference about $\mu$.
The t-test will tell you when you can believe that
\[ \mu_1 = \mu_2 \]
on the basis that $\bar{x}_1 \cong \bar{x}_2$. (The symbol $\cong$ means ``approximately equal to''.)
[caption id="attachment_314" align="aligncenter" width="600"] Figure 1.2: Two sample $t$-test[/caption]Here the two populations could be 10 year olds (population 1) and 11 year olds (population 2) in Saskatchewan. You might measure the two populations to get some idea about how much 10 year old kids in Saskatchewan grow in one year. The two sample $t$-test will give you information on the difference of the average heights in the population, $\mu_{1} - \mu_{2}$ on the basis of the difference of the means of small samples that you take from each population, $\overline{x}_{1} - \overline{x}_{2}$.
Say we want to know how fast a population grows in 1-year (e.g. pop = 10 year old kids). You can do the two-sample test with two separate populations but if you want to know how the environment affected the growth of the children (maybe you are concerned that they don't get enough to eat) then the two-sample test is only an approximation. The genetic composition, the natural ability to grow, may be different in the two separate populations. To get at the effect of the environment, without the measurements being confounded by individual differences, we would take a sample of 10 year old kids from the population now and measure their heights. Then we wait a year and measure the height of the same sample of now 11 year old kids. Then we combine the two samples of data into one data sample of differences. The Paired $t$-test will tell you if the average of differences (in heights) is zero or not.
[caption id="attachment_315" align="aligncenter" width="600"] Figure 1.3: Paired $t$-test[/caption]]]>\[ W = \frac{R+ 1}{G} \]
where $G$ is the number of groups (or classes) you want.
(d) Begin the frequency table's first two columns :
Class | Class Boundaries |
$L$ to $ (L+W-1)$ | $(L - 0.5)$ to $(L - 0.5 + W)$ |
$(L+W)$ to $ (L+2W-1)$ | $(L-0.5+W)$ to $(L-0.5+2W)$ |
$\vdots$ | $\vdots$ |
$(H + 0.5 - W)$ to $(H+ 0.5)$ |
Class | Class Boundaries | Tally | Frequency | Cumulative Freq. | Relative Freq. |
$a$ | $a$ | $a/n$ | |||
$b$ | $a+b$ | $b/n$ | |||
$c$ | $a+b+c$ | $c/n$ | |||
$\vdots$ | $\vdots$ | ||||
$n$ |
A | B | B | AB | O |
O | O | B | AB | B |
B | B | O | A | O |
A | O | O | O | AB |
AB | A | O | B | A |
Class | Tally | Frequency | Cumulative Freq. | Relative Freq. |
A | ||||| | 5 | 5 | 5/25 = 0.20 |
B | ||||| || | 7 | 12 | 7/25 = 0.28 |
O | ||||| |||| | 9 | 21 | 9/25 = 0.36 |
AB | |||| | 4 | 25 | 4/25 = 0.16 |
112 | 100 | 127 | 120 | 134 | 118 | 105 | 110 | 109 | 112 |
110 | 118 | 117 | 116 | 118 | 122 | 114 | 114 | 105 | 109 |
107 | 112 | 114 | 115 | 118 | 117 | 118 | 122 | 106 | 110 |
116 | 108 | 110 | 121 | 113 | 120 | 119 | 111 | 104 | 111 |
120 | 113 | 120 | 117 | 105 | 110 | 118 | 112 | 114 | 114 |
Class | Class Boundaries | Tally | Frequency | Cumulative Freq. | Relative Freq. |
100 -- 104 | 99.5 to 104.5 | || | 2 | 2 | 0.04 |
105 -- 109 | 104.5 to 109.5 | ||||| ||| | 8 | 10 | 0.16 |
110 -- 114 | 109.5 to 114.5 | etc. | 18 | 28 | 0.36 |
115 -- 119 | 114.5 to 119.5 | 13 | 41 | 0.26 | |
120 -- 124 | 119.5 to 124.5 | 7 | 48 | 0.14 | |
125 -- 129 | 124.5 to 129.5 | 1 | 49 | 0.02 | |
130 -- 134 | 129.5 to 134.5 | 1 | 50 | 0.02 | |
= 1 |
heights $\leq$ 5 ft = ``short'' (group value = 1)
heights $>$ 5 ft = ``tall'' (group value = 2)
Groups are also known as classes. We will be spending time defining classes in Chapter 2. Identifying what type of variable you data is will be the best way for you to decide what statistical test you need after you have learned and understood a number of different tests.Class | Frequency | Cumulative Freq. | Relative Freq |
A | 5 | 5 | 0.20 |
B | 7 | 12 | 0.28 |
O | 9 | 21 | 0.36 |
AB | 4 | 25 | 0.16 |
The probability of having type A blood is 0.20 (or 20$\%$).
The probability of having type B blood is 0.28 (or 28$\%$).
The probability of having type 0 blood is 0.36 (or 36$\%$).
The probability of having type AB blood is 0.16 (or 16$\%$).
2. Frequency Polygons. Frequency polygons are just another form of histogram. We have been talking about ``area under the curve'' to represent probability. The curve of a frequency polygon is a little bit smoother than the curve of a traditional histogram. Frequency polygons can, of course be made for either straight frequency or relative frequency data. A frequency polygon for the relative frequency blood type data is shown in Figure 2.3. [caption id="attachment_338" align="aligncenter" width="600"] Figure 2.3 : Relative frequency polygon for the blood type data. Plot a dot at the center of each class at the $y$-value of the relative frequency then connect the dots as shown.[/caption] 3. Cumulative Frequency Graph. Plotting the cumulative frequencies from the frequency table results in a cumulative frequency graph as shown in Figure 2.4. Cumulative relative frequencies can also be computed (add up relative frequencies as you move down the column) and plotted. The cumulative frequency graph shows the ``area under the curve'' (of the traditional histogram) from the beginning of the first class up to the given point. Cumulative frequencies or cumulative relative frequencies with therefore show up later as areas under probability distribution curves up to a given point (it represents the probability of having a value equal to or less than the given value if that quantity is pulled at random from the population.) [caption id="attachment_344" align="aligncenter" width="600"] Figure 2.4 : Cumulative frequency graph for the blood sample data. Plot a dot at the end of the relevant class at a $y$-value equal to the cumulative frequency. Then connect the dots as shown.[/caption] 4. Pie Chart. A pie chart is a round histogram. Everyone has seen a pie chart, it is intuitive. The angles in the pie chart are computed using:Angle = Relative Frequency $\times 360^{\circ}$.
For the blood type data, the explicit angle calculations are :
Class | Angle |
A | 0.20 $\times 360^{\circ}$ = $72^{\circ}$ |
B | 0.28 $\times 360^{\circ}$ = $100.8^{\circ}$ |
O | 0.36 $\times 360^{\circ}$ = $129.6^{\circ}$ |
AB | 0.16 $\times 360^{\circ}$ = $57.6^{\circ}$ |
Check Sum = $360^{\circ}$ |
Class | Frequency |
A | 5 |
B | 7 |
O | 9 |
AB | 4 |
|50,51,51,52,53,53,|55,55,56,57,57,58,59,|62,63,|65,65,66,66,67,68,69,69|72,73,|75,75,77,78,79|
where the bars illustrate the division of the data into low and high decades, step 2. The first number of each data point is the leading digit (stem), the last, the trailing digit (leaf). So with this, step 3 leads to :Stem | Leaf |
5 | 0 1 1 2 3 3 |
5 | 5 5 6 7 7 8 9 |
6 | 2 3 |
6 | 5 5 6 6 7 8 9 9 |
7 | 2 3 |
7 | 5 5 7 8 9 |
\[\bar{x} = sample\hspace{.1cm}mean\]
\[\mu = population\hspace{.1cm}mean\]
The formula for a sample mean is :\[ \bar{x} = \frac{ \sum_{i=1}^{n} x_{i} }{n} \]
where $n$ is the number of data points in the sample, the sample size. For a population, the formula is\[ \mu = \frac{ \sum_{i=1}^{N} x_{i} }{N} \]
where $N$ is the size of the population. Example 3.1 : Find the mean of the following data set :84 | 12 | 27 | 15 | 40 | 18 | 33 | 33 | 14 | 4 |
$x_{1}$ | $x_{2}$ | $x_{3}$ | $x_{4}$ | $x_{5}$ | $x_{6}$ | $x_{7}$ | $x_{8}$ | $x_{9}$ | $x_{10}$ |
$x$ | label |
84 | $x_{1}$ |
12 | $x_{2}$ |
27 | $x_{3}$ |
15 | $x_{4}$ |
40 | $x_{5}$ |
18 | $x_{6}$ |
33 | $x_{7}$ |
33 | $x_{8}$ |
14 | $x_{9}$ |
4 | $x_{10}$ |
Total = 280 |
☐
Mean for grouped data : If you have a frequency table for a dataset but not the actual data, you can still compute the (approximate) mean of the dataset. This somewhat artificial situation for datasets will be a fundamental situation when we consider probability distributions. The formula for the mean of grouped data is \begin{equation} \tag{3.1} \bar{x} = \frac{ \sum_{i=1}^{G} f_i x_{m_{i}}}{n}\ \end{equation} where $f_{i}$ is the frequency of group $i$, $x_{m_{i}}$ is the class center of group $i$ and $n$ is the number of data points in the original dataset. Recall that $n = \sum f_{i}$ so we can write this formula as\[\bar{x} = \frac{ \sum_{i=1}^{G} f_i x_{m_{i}}}{\sum_{i=1}^{G} f_{i}}\]
which is a form that more closely matches with a generic weighted mean formula; the formula for the mean of grouped data is a special case of a more general weighted mean that we will look at next. The class center is literally the center of the class -- the next example shows how to find it. Example 3.2 : Find the mean of the dataset summarized in the following frequency table.Class | Class Boundaries | Frequency, $f_{i}$ | Midpoint, x_{m_{i}} | $f_{i}x_{m_{i}}$ |
1 | 5.5 - 10.5 | 1 | 8 | 8 |
2 | 10.5 - 15.5 | 2 | 13 | 26 |
3 | 15.5 - 20.5 | 3 | 18 | 54 |
4 | 20.5 - 25.5 | 5 | 23 | 115 |
5 | 25.5 - 30.5 | 4 | 28 | 112 |
6 | 30.5 - 35.5 | 3 | 33 | 99 |
7 | 35.5 - 40.5 | 2 | 38 | 76 |
sums | n=$\sum f_{i}$ = 20 | $\sum f_{i}x_{m_{i}}$ = 490 |
\[\bar{x} = \frac{\sum_{i} f_{i} x_{m_{i}}}{n}\]
We need the sum in the numerator and the value for $n$ in the denominator. Get the numbers from the sums of the columns as shown in the frequency table :
\[\bar{x} = \frac{\sum_{i} f_i x_{m_{i}}}{n} = \frac{490}{20} = 24.5\]
☐
Note that the grouped data formula gives an approximation of the mean of the original dataset in the following way. The exact mean is given by\[\bar{x} = \frac{\sum_{i=1}^{n} x_{i}}{n} = \frac{\sum_{j=1}^{G} (\sum_{k=1}^{f_{i}} x_{k} ) }{n}.\]
So the approximation is that
\[\sum_{k=1}^{f_{i}} x_{k} = f_{i} x_{m_{i}}\]
which would be exact only if all $x_{k}$ in group $i$ were equal to the class center $ x_{m_{i}}$. Generic Weighted Mean : The general formula for weighted mean is \begin{equation} \tag{3.2} \bar{x} = \frac{\sum_{i=1}^{n} w_{i} x_{i}}{\sum_{i=1}^{n} w_{i}} \end{equation} where $w_{i}$ is the weight for data point $i$. Weights can be assigned to data points for a variety of reasons. In the formula for grouped data, as a weighted mean, treats the class centers as data points and the group frequencies as weights. The next example weights grades. Example 3.3 : In this example grades are weighted by credit units. The weights are as given in the table :Course | Credit Units, $w_{i}$ | Grade, $x_{i}$ | $w_{i}x_{i}$ |
English | 3 | 80 | 240 |
Psych | 3 | 75 | 225 |
Biology | 4 | 60 | 240 |
PhysEd | 2 | 82 | 164 |
$\sum w_{i}$ = 12 | $\sum x_{i}$ = 297 | $\sum w_{i}x_{i}$ = 869 |
\[\bar{x} = \frac{\sum w_i x_i}{\sum w_i}\]
so we need two sums. The double bars in the table above separate given data from columns added for calculation purposes. We will be using this convention with the double bars in other procedures to come. Using the sums for the table we get\[\bar{x} = \frac{\sum w_i x_i}{\sum w_i} = \frac{869}{12} = 72.4\]
Note, that the unweighted mean for these data is\[\bar{x} = \frac{\sum x_i}{n} = \frac{297}{4} = 74.3\]
which is, of course, different from the weighted sum.☐
Given data in order: 180 186 191 201 209 219 220
$\uparrow$$MD = 201$
Given data in order: 656 684 702 764 856 1132 1133 1303
$\uparrow$ $\uparrow$$MD = \frac{764 + 856}{2} = 810$
In these examples, the tedious work of putting the data in order from smallest to largest was done for us. With a random bunch of numbers, the work of finding the median is mostly putting the data in order.$\underline{8}$, 9, 9, 14, $\underline{8}$, $\underline{8}$, 10, 7, 6, 9, 7, $\underline{8}$, 10, 14, 11, $\underline{8}$, 14, 11
8 occurs 5 times, more than any other number. So the mode is 8.☐
Example 3.5 : The dataset110, 731, 1031, 84, 20, 118, 1162, 1977, 103, 72
has no mode. Do not say that the mode is zero. Zero is not in the dataset.☐
Example 3.6 : The dataset15, $\underline{18, 18, 18,}$ 20, 22, $\underline{24, 24, 24,}$ 26, 26
has two modes: 18 and 24. This data set is bimodal. The concept of mode really makes more sense for frequency table/histogram data.☐
Example 3.7 : The mode of the following frequency table data is the class with the highest frequency.Class | Class Boundaries | Freq |
1 | 5.5 - 10.5 | 1 |
2 | 10.5 - 15.5 | 2 |
3 | 15.5 - 20.5 | 3 |
4 | 20.5 - 25.5 | 5 (Modal Class) |
5 | 25.5 - 30.5 | 4 |
6 | 30.5 - 35.5 | 3 |
7 | 35.5 - 40.5 | 2 |
☐
\[\mbox{MR} = \frac{H+L}{2}\]
where $H$ and $L$ are the high and low data values. Example 3.8 : Given the following data : 2, 3, 6, 8, 4, 1. We have\[\mbox{MR} = \frac{8+1}{2} = 4.5\]
☐
$\sum A_{i} x_{i} = A_{t} x_{g}$
$\sum f_{i} x_{i} = (\sum f_{i} ) x_{g}$
$\sum f_{i} x_{i} = n x_{g}$
$x_{g} = \frac{\sum f_{i} x_{i}}{n}$
where we have used $A_{i} = f_{i}$ because the class widths are one, so\[x_{g} = \overline{x} = \frac{\sum f_{i} x_{i}}{n}.\]
Because our ``weight'' is area, $\overline{x}$ is technically called the ``1st moment of area''. (Variance, covered next, is the ``2nd moment of area about the mean''.)◻
]]>Variance :
\begin{equation}\tag{3.3} \sigma^2 = \frac{\sum_{i=1}^{N}(x_{i} - \mu)^2}{N}\ \end{equation}
where $N$ is the size of the population, $\mu$ is the mean of the population and $x_{i}$ is an individual value from the population. Standard Deviation : \[ \sigma = \sqrt{\sigma^2} \] The standard deviation, $\sigma$, is a population parameter, we will learn about how to make inferences about population parameters using statistics from samples. 2. Sample Formulae : Variance :\begin{equation*}\tag{3.4} s^{2} = \frac{\sum_{i=1}^{n}(x_i - \bar{x})^2}{(n-1)} \end{equation*}
where $n$ = sample size (number of data points), $n-1$ = degrees of freedom for the given sample, $\overline{x}$ and $x_{i}$ is a data value. Standard Deviation : \[s = \sqrt{s^2}\] Equations (3.3) and (3.4) are the definitions of variance as the second moment about the mean; you need to determine the means ($\mu$ or $\overline{x}$) before you can compute variance with those formulae. They are algebraically equivalent to a ``short cut'' formula that allow you to compute the variance directly from sums and sums of squares of the data without computing the mean first. For the sample standard deviation (the useful one) the short cut formula is\begin{equation*}\tag{3.5} s^{2} = \frac{\sum_{i=1}^{n}x^{2}_{i} - (\frac{(\sum^n_{i=1}x_i)^2}{n})}{n-1} \end{equation*}
At this point you should figure out how to compute $\bar{x}$, $s$ and $\sigma$ on your calculator for a given set of data. Fact (not proved here) : The sample standard deviation $s$ is the ``optimal unbiased estimate'' of the population standard deviation $\sigma$. $s$ is a statistic'', the best statistic it turns out, that is used to estimate the population parameter $\sigma$. It is the $n-1$ in the denominator that makes $s$ the optimal unbiased estimator of $\sigma$. We won't prove that here but we will try and build up a little intuition about what that should be so -- why dividing by $n-1$ should be better than dividing by $n$. ($n-1$ is known as the degrees of freedom of the estimator $s$). First notice that you can't guess or estimate a value for $\sigma$ (i.e. compute $s$) with only one data point. There is no spread of values in a data set of one point! This is part of the reason why the degrees of freedom is $n-1$ and not $n$. A more direct reason is that you need to remove one piece of information (the mean) from your sample before you can guess $\sigma$ (compute $s$). Coefficient of Variation The coefficient of variation, CVar, is a ``normalized'' measure of data spread. It will not be useful for any inferential statistics that we will be doing. It is a pure descriptive statistic. As such it can be useful as a dependent variable but we treat it here as a descriptive statistic that combines the mean and standard deviation. The definition is : \[\mbox{CVar} & = & \frac{s}{\bar{x}} \times 100\% \mbox{\ \ \ \ samples} \] \[\mbox{CVar} & = & \frac{\sigma}{\mu} \times 100 \% \mbox{\ \ \ \ population}\] Example 3.9 : In this example we take the data given in the following table as representing the whole population of size $N=6$. So we use the formula of Equation (3.3) which requires us to sum $(x_{i} - \mu)^2$.$x_{i}$ | $\left(x_{i}-\mu\right)^{2}$ |
10 | $\left(10-35\right)^{2}$ |
60 | $\left(60-35\right)^{2}$ |
50 | $\left(50-35\right)^{2}$ |
30 | $\left(30-35\right)^{2}$ |
40 | $\left(40-35\right)^{2}$ |
20 | $\left(20-35\right)^{2}$ |
$\sum x_{i}=210$ | $\sum\left(x_{i}-\mu\right)^{2}=1750$ |
\[\mu = \frac{\sum x_i}{N} = \frac{210}{6} = 35.\]
Then with that mean we compute the quantities in the second (calculation) column above and sum them. And then we may compute the variance :\[\sigma^2 = \frac{\sum(x_i - \mu)^2}{N} = \frac{1750}{6} = 291.7\] and standard deviation\[\sigma = \sqrt{\sigma^2} = \sqrt{291.7} = 17.1.\]
Finally, because we can, we compute the coefficient of variation:
\[\mbox{CVar} = \frac{\sigma}{\mu} \times 100\% = \frac{17.1}{35} \times 100\% = 48.9\%.\]
◻
Example 3.10 : In this example, we have a sample. This is the usual circumstance under which we would compute variance and sample standard deviation. We can use either Equation (3.4) or (3.5). Using Equation (3.4) follows the sample procedure that is given in Example 3.9 and we'll leave that as an exercise. Below we'll apply the short-cut formula and see how $s$ may be computed without knowing $\bar{x}$. The dataset is given in the table below in the column to the left of the double line. The columns to the right of the double line are, as usual, our calculation columns. The size of the sample is $n=6$.$x_{i}$ | $\left(x_{i}-\overline{x}\right)^{2}$ | $x_{i}^{2}$ |
11.2 | $11.2^{2}$ = 125.44 | |
11.9 | $11.9^{2}$ = 141.161 | |
12.0 | exercise | $12.0^{2}$ = 144 |
12.8 | $12.8^{2}$ = 163.84 | |
13.4 | $13.4^{2}$ = 179.56 | |
14.3 | $14.3^{2}$ = 204.49 | |
$\sum x_{i}=75.6$ | $\sum x_{i}^{2}=958.94$ |
◻
Grouped Sample Formula for Variance As with the mean, we can compute an approximation of the data variance from frequency table, histogram, data. And again this computation is precise for probability distributions with class widths of one. The grouped sample formula for variance is\begin{equation*}\tag{3.6} s^{2} = \frac{\sum_{i=1}^{G} (f_{i} \cdot x^{2}_{m_{i}}) - [\frac{(\sum_{i=1}^{G} f_{i} \cdot x_{m_{i}})^2}{n}]}{n-1} \end{equation*}
where $G$ is the number of groups or classes, $x_{m_{i}}$ is the class center of group $i$, $f_{i}$ is the frequency of group $i$ and \[n = \sum_{i=1}^{G} f_{i}\] is the sample size. Equation (3.6) the short-cut version of the formula. We can also write \[s^{2} = \frac{\sum_{i=1}^{G} f_{i} (x_{m_{i}} - \mu)^{2}}{n-1}\] or if we are dealing with a population, and the class width is one so that the class center $X_{m_{i}} = X_{i}$, \[\sigma^{2} = \frac{\sum_{i=1}^{G} f_{i} (X_{m_{i}} - \mu)^{2}}{N}\] which will be useful when we talk about probability distributions. In fact, let's look ahead a bit and make the frequentist definition for the probability for $X_{i}$ as $P(X_{i}) = f_{i}/N$ (which is the relative frequency of class $i$) so that \begin{equation*}\tag{3.7} \sigma^{2} = \sum_{i=1}^{G} P(X_{i}) (X_{i} - \mu)^{2}. \end{equation*} If we make the same substitution $P(X_{i}) = f_{i}/N$ in the grouped mean formula, Equation (3.1) with population items $X$ and $N$ in place of the sample items $x$ and $n$, then it becomes \begin{equation*}\tag{3.8} \mu = \sum_{i=1}^{G} P(X_{i}) X_{i}. \end{equation*} More on probability distributions later, for now let's see how we use Equation (3.6) for frequency table data. Example 3.11 : Given the frequency table data to the left of the double dividing line in the table below, compute the variance and standard deviation of the data using the grouped data formula.Class | Class Boundaries | Freq, $f_{i}$ | Class Centre $x_{m_{i}}$ | $f_{i}\cdot x_{m_{i}}$ | $x^{2}_{m_{i}}$ | $f_{i}\cdot x^{2}_{m_{i}}$ |
1 | 5.5 - 10.5 | 1 | 8 | $1\cdot 8 = 8$ | $8^{2}=64$ | $1\cdot 64 = 64$ |
2 | 10.5 - 15.5 | 2 | 13 | $2\cdot 13 = 26$ | $13^{2}=169$ | $2\cdot 169 = 338$ |
3 | 15.5 - 20.5 | 3 | 18 | $3\cdot 18 = 54$ | $18^{2}=324$ | $3\cdot 324 = 972$ |
4 | 20.5 - 25.5 | 5 | 23 | $5\cdot 23 = 115$ | $23^{2}=529$ | $5\cdot 529 = 2645$ |
5 | 25.5 - 30.5 | 4 | 28 | $4\cdot 28 = 112$ | $28^{2}=784$ | $4\cdot 784 = 3136$ |
6 | 30.5 - 35.5 | 3 | 33 | $3\cdot 33 = 99$ | $33^{2}=1089$ | $3\cdot 1089 = 3267$ |
7 | 35.5 - 40.5 | 2 | 38 | $2\cdot 38 = 76$ | $38^{2}=1444$ | $2\cdot 1444 = 2888$ |
$\sum f=20$ | $\sum fx_{m}=490$ | $\sum fx^{2}_{m}=13310$ |
◻
Now is a good time to figure out how to compute $\bar{x}$ and $s$ (and $\sigma$) on your calculators.]]>Data $x_{i}$ | $x_{i}^{2}$ | $z$-score, $z_{i}$ |
18 | 324 | (18-9.9)/6.2 = 1.3 |
15 | 225 | (15-9.9)/6.2 = 0.8 |
12 | 144 | (12-9.9)/6.2 = 0.3 |
6 | 36 | (6-9.9)/6.2 = -0.6 |
8 | 64 | (8-9.9)/6.2 = -0.3 |
2 | 4 | (2-9.9)/6.2 = -1.3 |
3 | 9 | (3-9.9)/6.2 = -1.1 |
5 | 25 | (5-9.5)/6.2 = -0.8 |
20 | 400 | (20-9.5)/6.2 = -1.7 |
10 | 100 | (10-9.5)/6.2 = 0.1 |
$\sum x_{i}=99$ | $\sum x_{i}^{2}=1331$ |
◻
]]>\[P(E) = \frac{4}{52} = 0.077 \mbox{\ \ \ (7.7\% if we were to express the result in percentages)}\]
▢
To use $P(E)$ mathematically we set
\[0 \leq P(E) \leq 1 \]
Where, probability-wise:0 means $E$ definitely will not occur, and 1 means $E$ definitely will occur.
This is a method we can use instead of using percent. To compute probabilities, we first need to know how to count. Fundamental Counting Rule Say you have n events in order, and for event $i$ there are $k_{i}$ ways for it to happen. Then the number of ways for the $n$ events to play out is : \[ k_1 \cdot k_2 \cdot k_3 \hdots k_n = \prod_{i=1}^{n} k_i \] (The giant pi symbolizes a multiplication convention in the same way that a giant sigma symbolizes a summation convention as described in Section 1.3.) Example 4.2 How many combinations are there on a lock with 3 numbers? Lay out the events as : $k_{1}=10$, $k_{2}=10$, and $k_{3}=10$. Note that each number can be anything from 0 to 9 giving 10 possibilities ($k_{i} = 10$) for each event. So the number of possible lock combinations is\[ k_1 k_2 k_3 = 10 \cdot 10 \cdot 10 = 10^3 = 1000 \]
Note that you could have guessed this because the combination range from 000 to 999 -- counting in base 10.▢
Example 4.3 Suppose that a hardware store can produce paints with the following qualities :Colour : red, blue, white, black, green, brown, yellow (7 colours)
Type : latex, oil (2 types)
Texture : flat, semigloss, high-gloss (3 textures)
Use : indoor, outdoor (2 uses)
How many ways are there to combine these qualities to produce a can of paint? Answer : From the above list $k_{1}=7, k_{2}=2, k_{3}=3, k_{4}=2$ and the number of possible paint kinds is: \[ 7 \cdot 2 \cdot 3 \cdot 2 = 84 \]▢
Applications of the Fundamental Counting Rule We are interested in applying the fundamental counting rule to two special, important cases :\[ {}_{n}P_{r} = \frac{n!}{(n-r)!} \]
This formula follows from the fundamental counting rule. With $n$ objects there are $k_{1} = n$ ways to select the first object. After selecting the first object there are $n-1$ ways to choose the second object so $k_{2} = n-1$, etc. up to $k_{r} = n - r + 1$ :\[ {}_nP_r = (n)(n-1)(n-2) \hdots (n-r+1)\]
\[ = \frac{(n)(n-1) \hdots (2)(1)}{(n-r)(n-r-1) \hdots (2)(1)} \]
Example 4.4 : How many ways are there to choose 5 numbered balls from a bucket of 25 to make a lottery number? Answer : $25 \cdot 24 \cdot 23 \cdot 22 \cdot 21 = 6,375,600$ possibilities.▢
2. Combinations. The number of ways of selecting $x$ objects from a collection of $n$ objects without caring about the order is :\[ {}_nC_x = \frac{n!}{(n-x)!x!} = \frac{{}_nP_x}{x!} = \left( \begin{array}{c} n \\ x \end{array}\right) \]
That last symbol $\left( \begin{array}{c} n \\ x \end{array}\right)$ is colloquially called ``$n$ choose $x$''. The second last expression demonstrates the application of the fundamental counting principal, it says\[ \left( \begin{array}{c} n \\ x \end{array}\right) = \frac{(n)(n-1) \hdots (n-x-1)}{x!} \]
where $x!$ is just the number of ways of arranging $x$ objects while caring about the order, $x! = {}_{x}P_{x}$. As a practical matter, never try to compute $n!$ It will usually be unimaginably big. Use the formula that directly shows the fundamental counting rule as shown in the following example. Example 4.5 : How many ways are there to select 10 balls from a bucket of 100? Answer : \[ \left( \begin{array}{c} 100 \\ 10 \end{array} \right) = \frac{100 \cdot 99 \cdot 98 \cdot 97 \cdot 96 \cdot 95 \cdot 94 \cdot 93 \cdot 92 \cdot 91} {10 \cdot 9 \cdot 8 \cdot 7 \cdot 6 \cdot 5 \cdot 4 \cdot 3 \cdot 2 \cdot 1} \\ = \frac{6.2815651 \times 10^{19}}{3,628,800} = \underline{17.3 \times 10^{12}} \]▢
The symbol $\left( \begin{array}{c} n \\ x \end{array}\right)$ is also known as the binomial coefficient because it shows up in algebra when you expand expressions of the form $(x+y)^{n}$. For example[footnote]You don't need this algebra for this statistics course. It's just interesting.[/footnote]\[ (x+y)^n = x^2 + 2xy + y^2 \]
\begin{eqnarray*} (x+y)^3 &=& \left( \begin{array}{c} 3\\0 \end{array} \right)x^3 + \left( \begin{array}{c} 3\\1 \end{array} \right)x^2y + \left( \begin{array}{c} 3\\2 \end{array} \right) xy^2 + \left( \begin{array}{c} 3\\3 \end{array} \right) y^3 \\ &=& x^3 + 3x^2y + 3xy^2 + y^3 \end{eqnarray*} The binomial coefficients can be quickly computed using Pascal's triangle : \[ \begin{array}{ccccccccccccccc} &&&&&&&&&&&&&& n = \\ &&&&&& 1 &&&&&&&& 0 \\ &&&&& 1 && 1 &&&&&&& 1 \\ &&&& 1 && 2 && 1 &&&&&& 2 \\ &&& 1 && 3 && 3 && 1 &&&&& 3 \\ && 1 && 4 && 6 && 4 && 1 &&&& 4 \\ & 1 && 5 && 10 && 10 && 5 && 1 &&& 5 \\ 1 && 6 && 15 && 20 && 15 && 6 && 1 && 6 \\ &&&&&& \mbox{etc.} \end{array} \] Referring to Pascal's triangle we can quickly write\[ (x+y)^6 = x^6 + 6x^5y + 15x^4y^2 + 20x^3y^3 + 15x^2y^4 + 6xy^5 + y^6 \]
for example.]]>\[P(1 \mid 1) = p = \left( \begin{array}{c} 1 \\ 1 \end{array} \right) p^{1} q^{0}.\]
Consider $n = 2$. What is $P(0 \mid 2)$? This is all failures : The probability of each failure is $q$ so the probability of getting FF is $q \cdot q = q^{2}$. So\[ P(0 \mid 2) = q^{2} = \left( \begin{array}{c} 2 \\ 0 \end{array} \right) p^{0} q^{2}. \]
(Note that $ \left( \begin{array}{c} 2 \\ 0 \end{array} \right) = 1$ by definition. There is exactly one way to draw no things from a collection of 2.) What is $P(1 \mid 2)$? Each probability of $p \cdot q$ ($p \cdot q$ for the first one, $q \cdot p$ for the second one). So\[ P(1 \mid 2) = 2 \cdot p \cdot q = \left( \begin{array}{c} 2 \\ 1 \end{array} \right) p^{1} q^{1}. \]
For $x = 2$ we have\[ P(2 \mid 2) = \left( \begin{array}{c} 2 \\ 2 \end{array} \right) p^{2} q^{0}. \]
We can continue this way for $n = 3, 4, \ldots$ but this is clearly tedious. The way of ``mathematical induction'' is the formal way to proceed but let's try a more intuitive approach.For $x$ successes in $n$ trials, consider our $n$ boxes, then any given sequence with $x$ successes will have $n-x$ failures and so that given sequence will have a probability of $p^{x}q^{n-x}$. But how many specific sequences with $x$ successes are there? Think of it this way. Of the $n$ boxes, how many ways are there to write $x$ S's in the $n$ boxes? There are $n$ possibilities ($n$ boxes are available) to write the first S, $n-1$ ways after that to write the second S, etc. But we don't care which order we wrote the S's into the boxes so divide by $n!$. In other words there are $ \left( \begin{array}{c} n\\ x \end{array} \right)$ specific sequences with $x$ successes. Putting it all together :
\[ P(x \mid n) = \left( \begin{array}{c} n \\ x \end{array} \right) p^{x} q^{n-x}. \]
▢
Example 4.6 : In bucket of 100 toys with 20 dinosaurs and 80 bugs, consider drawing a dinosaur a success. So $P(S)=p=0.2$ and $P(F)=q = 1-p=0.8$. Let us make an approximation and assume that $p$ does not change with each draw[footnote]**By assuming that $p$ does not change, we will be lead to the binomial distribution. If we more accurately assume that $P(S)$ changes with each draw we will be lead to the hypergeometic distribution. For fun, let's consider the case where $P(S)$ changes with each draw. It's just another application of the fundamental counting rule. To begin, there are $ \left( \begin{array}{c} 100 \\ 10 \end{array}\right) = 17.3 \times 10^{12} $ ways of drawing 10 toys from the bucket without caring if it is a dinosaur or a bug. This is the size of the sample space; it is how many ways there are to make a sample of size 10 from the bucket of 100 choices; it is $n(S)$ in Equation (4.1). There are $17.3 \times 10^{12}$ samples of 10 in the bucket. If we want 3 dinosaurs in our sample, as in the example in text then of the 20 dinosaurs in the bucket, there are $ \left( \begin{array}{c} 20 \\ 3 \end{array} \right) = 1140 $ ways to get 3 dinosaurs and $ \left( \begin{array}{c} 80 \\ 7 \end{array} \right) = 3.18 \times 10^{9} $ ways to get 7 bugs from the 80 in the bucket. So there are $ \left( \begin{array}{c} 20 \\ 3 \end{array} \right) \cdot \left( \begin{array}{c} 80 \\ 7 \end{array} \right) = 3.62 \times 10^{12} $ ways to draw 3 dinosaurs and 7 bugs from the bucket. This number is $n(E)$ in Equation (4.1). And so \[ P(3 \mbox{ dinosaurs} \mid 10 \mbox{ toys}) = \frac{\left( \begin{array}{c} 20 \\ 3 \end{array} \right) \left( \begin{array}{c} 80 \\ 7 \end{array} \right)}{\left( \begin{array}{c} 100 \\ 10 \end{array} \right)} = \frac{3.62 \times 10^{12}}{17.3 \times 10^12} = 0.209 \] Note how close this is to the answer from the binomial distribution of 0.201.[/footnote] Say we want to know $P$(3 successes $\mid$10 trials). In other words, what is the probability that if I take 10 toys out of the bucket that exactly 3 of them are dinosaurs? Using Equation (4.2) we find\[ P(3 \mid 10) = \left( \begin{array}{c} 10 \\ 3 \end{array} \right) 0.2^{3} 0.8^{7} = 0.201. \]
The actual process of doing this calculation is somewhat tedious and therefore error prone. So in a test, for example, you will want to use the Binomial Distribution Table included in this text in the Appendix. In the Binomial Distribution Table, you simply find the appropriate $n$ and then $x$ in the column on the left and then look under the appropriate $p$ column to find $P(x \mid n)$ for the given $p$.▢
The complete binomial distribution specifies the probabilities of all $x$ successes from 0 to $n$, and can be plotted as a histogram. Note that there is a binomial distribution for each $x$ and $p$. Let's plot the binomial distribution for getting $x$ successes (dinosaurs) in forming a sample of $n=10$ toys with $p=0.2$. The Binomial Distribution Table contains the relative frequency table for the histogram that represents the binomial distribution shown in Figure 4.1. [caption id="attachment_1473" align="aligncenter" width="578"] Figure 4.1 : The binomial distribution for the example of forming samples of $n=10$ toys with $x$ representing the number of dinosaurs in the sample and $p = 0.2$ being the probability of selecting a dinosaur in forming the sample. Note that the probability of $x$ = 8, 9 or 10 is not zero, just less than 0.001.[/caption] The binomial distribution is an example of a discrete probability distribution. It is a histogram of relative frequencies obtained by counting possibilities in sample space.[footnote]Sample space is the set of all possible samples.[/footnote]The mean and variance of any discrete distribution are given by
\[ \mu = \sum_{x} x \cdot P(x) \]
\[ \sigma^{2} = \sum_{x}(x-\mu)^2 \cdot P(x) = \left [ \sum_{x} x^2 \cdot P(x) \right] - \mu^2 \]
These two formulae come from the grouped data expressions $\mu = \sum f(x) x/n$ and $\sigma^{2} = \sum f(x)(x - \mu)^2/n$, by substituting $P(x) = f(x)/n$. If we substitute Equation 4.2 for $P(x)$ in these general equations we get\[ \mu = n p \]
\[ \sigma^2 = npq \]
which are the mean and variance for a binomial distribution with parameters $n$ and $p$. The mean is the expected value.
Example 4.7 : For the bucket of toys example:\[ \mu = n \cdot p = 10 \cdot 0.20 = 2 \]
So given any random sample of 10 toys we expect that 2 of them will be red.▢
▢
]]>Discrete | Continuous | |
Skewness | $\mu_3 = \frac{1}{\sigma^{3}} \sum(x-\mu)^{3}P(x)$ | $\mu_3 = \frac{1}{\sigma^{3}} \int(x-\mu)^{3}P(x)dx$ |
Kurtosis | $\mu_4 =\frac{1}{\sigma^{4}} \sum(x - \mu)^{4}P(x)$ | $\mu_4 = \frac{1}{\sigma^{4}} \int (x - \mu)^4 P(x)\:dx$ |
The moments of a probability distribution are important. In fact, if you specify all the moments of a distribution then you have completely specified the distribution. Let's say that in another way. The specify a probability distribution you can either give its formula (as generally derived from counting) or you can give all its moments. The normal distribution with a mean of $\mu$ and a variance of $\sigma^{2}$ is specified by the formula
\begin{equation}\tag{5.1} P(x) = \frac{e^{-(x-\mu)^2/2\sigma^2}}{\sigma \sqrt{2\pi}} \end{equation} or by its moments. The normal distribution with a mean of $\mu$ and a variance of $\sigma^{2}$ is the only continuous probability distribution with moments (from first to second an on up) of: $\mu$, $\sigma^{2}$, 0, 1, 0, 1, 0, $\ldots$. The normal distribution is special that way among probability distributions. ]]>▢
Example 5.2 : Find the probability that $z$ is between -1.75 and 0. Solution : $P(-1.75 < z < 0) = A(1.75) = 0.4599$, see Figure 5.7. [caption id="attachment_1512" align="aligncenter" width="503"] Figure 5.7 : The situation for Example 5.2.[/caption]▢
Case 2 : Tail areas. A tail area is the opposite of the area given in the Standard Normal Distribution Table on one half of the normal distribution, see Figure 5.8. The tail area after a given positive $z$ is $P = P(x > z) = 0.5 - A(z)$ or before a given negative value $-z$ is $P = P(x < -z) = 0.5 - A(z)$. [caption id="attachment_1514" align="aligncenter" width="760"] Figure 5.8 : Case 2 : Tail areas.[/caption] Example 5.3 : What is the probability that $z > 1.11$? Solution : $P(z > 1.11) = 0.5 - A(1.11) = 0.5 - 0.3665 = 0.1335$, see Figure 5.9. [caption id="attachment_1515" align="aligncenter" width="422"] Figure 5.9 : The situation for Example 5.3.[/caption]▢
Example 5.4 : What is the probability that $z < -1.93$? Solution : $P = P(z < -1.93) = 0.5 - A(1.93) = 0.5 - 0.4732 = 0.0268$, see Figure 5.10. [caption id="attachment_1629" align="aligncenter" width="300"] Figure 5.10 : The situation for Example 5.2.[/caption]▢
Case 3 : An interval on one side of the mean. Recall that $\mu=0$ for the $z$-distribution. So we are looking for the probabilities $P = P(z_{1} < x < z_{2})$ for an interval to the right of the mean or $P = P(-z_{2} < x <- z_{1})$ for an interval to the left of the mean. In either case $P = A(z_{2}) - A(z_{1})$, see Figure 5.11. [caption id="attachment_1925" align="aligncenter" width="570"] Figure 5.11: Case 3: An interval on one side of the mean.[/caption] Example 5.5 : What is the probability that $z$ is between 2.00 and 2.97? Solution : $P(2.00 < z < 2.97) = A(2.47) - A(2.00) = 0.4932 - 0.4772 = 0.0160$, see Figure 5.12. [caption id="attachment_1926" align="aligncenter" width="408"] Figure 5.12: The situation for Example 5.5.[/caption]▢
Example 5.6 : What is the probability that $z$ is between -2.48 and -0.83? Solution : $P(-2.48 < z < -0.83) = A(2.48) - A(0.83) = 4.934 - 0.2967 = 0.1967$, see Figure 5.13. [caption id="attachment_1927" align="aligncenter" width="472"] Figure 5.13: The situation of Example 5.6.[/caption]▢
Case 4 : An interval containing the mean. The situation is as shown in Figure 5.14 with the interval being between a negative and a positive number. In that case $P(-z_{1} < x < z_{2}) = A(z_{1}) + A(z_{2})$. [caption id="attachment_1928" align="aligncenter" width="407"] Figure 5.14: Case 4: An interval containing the mean.[/caption] Example 5.7 : What is the probability that $z$ is between -1.37 and 1.68? Solution : $P(-1.37 < z < 1.68) = A(1.37) + A(1.68) = 0.4147 + 0.4535 = 0.8682$, see Figure 5.15. [caption id="attachment_1929" align="aligncenter" width="391"] Figure 5.15: The situation for Example 5.7.[/caption]▢
Cases 5 & 6 : Excluding tails. Case 5 is excluding the right tail, $P(x < z)$. Case 6 is excluding the left tail, $P(x > -z)$. See Figure 5.16. Case 5 is the situation which gives the percentile position of $z$ if you multiply the are by 100. More about percentiles in Chapter 6. In either case, $P = 0.5 + A(z)$. [caption id="attachment_1930" align="aligncenter" width="584"] Figure 5.16: Left: Case 5. Right: Case 6.[/caption] Case 7 : Two unequal tails. In this case we add the areas of the left and right tails, see FIgure 5.17. The special case where the tails have equal areas (i.e. when $z_{1} = z_{2}$ in the notation we have been using) is the case we will encounter for two-tail hypothesis testing. $P = P(x< -z_{1}) + P(x < z_{2}) = (0.5 - A(z_{1})) + (0.5 - A(z_{2}))$. [caption id="attachment_1931" align="aligncenter" width="432"] Figure 5.17: Case 7: Two unequal tails.[/caption] Example 5.8 : Find the areas of the tails shown in Figure 5.18. Solution : $P(z < -3.01 \mbox{ or } z > 2.43)$ $= (0.5 - A(3.01)) + (0.5 - A(2.43)) $ $= (0.5 - 0.4987) + (0.5 - 0.4925) $ $= 0.0013 + 0.0075$ $= 0.0088$. [caption id="attachment_1932" align="aligncenter" width="408"] Figure 5.18: The situation for Example 5.8.[/caption]▢
Using the Standard Normal Distribution Table backwards Up until now we've used the Standard Normal Distribution Table directly. For a given $z$, we look up the area $A(z)$. Now we look at how to use it backwards: We have a number that represents the area between 0 and $z$, what is $z$? Let's illustrate this process with an example. Example 5.9 : We are given an area $P=0.2123$ as shown in Figure 5.19. What is {z}? Solution : Look in the Standard Normal Distribution Table for the closest value to the given $P$. In this case 0.2123 corresponds exactly to $z = 0.56$. [caption id="attachment_1933" align="aligncenter" width="376"] Figure 5.19: The situation for Example 5.9.[/caption]▢
Example 5.9 was artificial in that the given area appeared exactly in the Standard Normal Distribution Table. Usually it doesn't. In that case pick the nearest area in the table to the given number and use the $z$ associated with the nearest area. This, of course, is an approximation. For those who know how, linear interpolation can be used to get a better approximation for $z$. The $z$-transformation preserves areas In a given situation of sampling a normal population, the mean and standard deviation of the population are not necessarily 0 and 1. We have just learned how to compute areas under a standard normal curve. How do we compute areas under an arbitrary normal curve? We use the $z$-transformation. If we denote the original normal distribution by $P(x)$ and the $z$-transformed distribution by $P(z)$ then areas under $P(x)$ will be transformed to areas under $P(z)$ that are the same. The $z$-transformation preserves areas. So we can compute areas, or probabilities under $P(z)$ using the Standard Normal Distribution Table and instantly have the probabilities we need for the original $P(x)$. Let's follow an example. Example 5.10 : Suppose we know that the amount of garbage produced by households follows a normal distribution with a mean of $\mu = 28$ pounds/month and a standard deviation of $\sigma = 2$ pounds/month. What is the probability of selecting a household that produces between 27 and 31 pounds of trash/month? Solution : First convert $x=27$ and $x=31$ to their $z$-scores: \[ z_{1} = z(27) = \frac{27-28}{2} = \frac{-1}{2} = -0.5 \] \[ z_{2} = z(31) = \frac{31-28}{2} = \frac{3}{2} = 1.5 \] Then, referring to Figure 5.20, we see that the probability is $P = A(0.5) + A(1.5) = 0.1915 + 0.4332 = 0.6247$. → z-transform →Figure 5.20 : The situation of Example 5.10. Left is the given population, $P(x)$. On the right is the $z$-transformed version of the population $P(z)$. The value 27 is $z$-transformed to -0.5 and 31 is $z$-transformed to 1.5.
▢
In Example 5.10 we used the Standard Normal Distribution Table directly. You will also need to know how to solve problems in which you use this table backwards. The next example shows how that is done. For this kind of problem you will find the $z$ first and then you will need to find $x$ using the inverse $z$-transformation : \[ x = z \cdot \sigma + \mu. \] which is derived by solving the $z$-transformation, $z = \frac{x-\mu}{\sigma}$ for $x$. Example 5.11 : In this example we work from given $P$. To be a police person you need to be in the top 10\% on a test that has results that follow a normal distribution with an average of $\mu = 200$ and $\sigma = 20$. What score do you need to pass? Solution : First, find the $z$ such that $P = P(y > z) = 0.10$. That $P$ is a right tail area (Case 2), so we need $A(z) = 0.4$, look at Figure 5.21 to see that. Then, going to the Standard Normal Distribution Table, look for 0.4 in the middle of the table then read off $z$ backwards. The closest area is 0.3997 which corresponds to $z = 1.28$. Using the inverse $z$-transformation, convert that $z$ to an $x$: to get \[ x = 1.28 \times 20 + 200 = 25.60 + 200 = 225.60 \] or, rounding, use $x = 226$. There are frequently consequences to our calculations and in this case we want to make sure that we have a score that guarantees a pass. So we round the raw calculation up to ensure that. ← inverse z-transform ←Figure 5.21 : The situation of Example 5.11
▢
]]>original data | 18 | 15 | 12 | 6 | 8 | 2 | 3 | 5 | 20 | 10 |
ordered data | 2 | 3 | 5 | 6 | 8 | 10 | 12 | 15 | 18 | 20 |
$i$ | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
$n=10$ |
▢
Decile :
$D(x_i) \equiv$ The decile of data value $x_{i}$ in the ordered position $i$ is defined as
\[ D(x_i) = \frac{P(x_i)}{10} \hspace{1in} 0 \leq D(x_{i}) \leq 10 \] We will not make much use of decile except to see that quartile is defined in the same way. Quartile : $Q(x_i) \equiv$ The quartile of data value $x_i$ in the ordered position 1. \begin{equation}\tag{6.3} Q(x_i) = \frac{P(x_{i})}{25} \hspace{1in} 0 \leq Q(x_{i}) \leq 4 \end{equation} Notation : (This notation also applies to $P$ and $D$.) We write : $Q_{0}$ &=& $0^{\rm th}$ quartile $Q_1$ & = & $1^{\rm st}$ quartile $Q_2$ & = & $2^{\rm nd}$ quartile $Q_3$ & = & $3^{\rm rd}$ quartile $Q_4$ & = & $4^{\rm th}$ quartile Quartiles are useful because we do not have to compute percentile first and then divide by 25 as given by Equation (6.3). Instead, we can use the following handy tricks after ordering our data: \begin{eqnarray*} Q_{2} & = & \mbox{ MD (median)}\\ Q_{1}& = & \mbox{ MD of values less than $Q_{2}$}\\ Q_{3} & = & \mbox{ MD of values greater than $Q_{2}$}\\ Q_{0} &=& L \\ Q_{4} &=& H \end{eqnarray*} Example 6.2 : Example with an even number of data points. With the data in order, first find the median, then the medians of the two halves of the dataset : \[5 \hspace{.25in} 6 \hspace{.25in} 12 \hspace{.25in} 13 \hspace{.25in} 15 \hspace{.25in} 18 \hspace{.25in} 22 \hspace{.25in} 50\] $Q_1 = \frac{6 + 12}{2} = 9$ $MD = \frac{13 + 15}{2} = 14 = Q_2$ $Q_3 = \frac{18 + 22}{2} = 20$ $Q_{0} = L = 5$ $Q_{4} = H = 50$▢
Example 6.3 : Example with an even number of data points. With the data in order, first find the median, then the medians of the two halves of the dataset : \[2 \hspace{.25in} 5 \hspace{.25in} 11 \hspace{.25in} 14 \hspace{.25in} 18 \hspace{.25in} 25 \hspace{.25in} 35\] $Q_{1} = 5$ $MD = 14 = Q_{2}$ $Q_{3} = 25$ $Q_{0} = L = 2$ $Q_{4} = H = 35$▢
]]>$= Q_{1} - (1.5 \times IQR)$ $= 9 - (1.5 \times 11)$ $= 9 - 16.5 = −7.5$
(b) upper acceptable value limit$= Q_{3} + (1.5 \times IQR)$ $= 20 + (1.5 \times 11)$ $= 20 + 16.5 = 36.5$
and 50 > 36.5 so 50 is considered an outlier.▢
]]>▢
Box plots can also be drawn vertically. SPSS draws box plots vertically; this is especially useful for comparing datasets.]]>Measures of central tendency and dispersion | |
Robust | Non-robust |
MD IQR | $\bar{x}$ $s$ |
Traditional | Exploratory Data Analysis (EDA) |
Frequency Tables Histogram Mean, $\bar{x}$ Standard Deviation, $s$ | Stem and Leaf Plot Box Plot Median, MD Interquartile Range, IQR |
▢
]]>\[ \bar{x} - z_{\alpha/2} \left( \frac{\sigma}{\sqrt{n}}\right) < \mu < \bar{x} + z_{\alpha/2} \left( \frac{\sigma}{\sqrt{n}}\right) \]
or using notation that we will use as a standard way of denoting symmetric confidence intervals \begin{equation}\tag{8.1} \bar{x} - E < \mu < \bar{x} + E \end{equation} where \[ E = z_{\cal{C}} \left( \frac{\sigma}{\sqrt{n}}\right). \] The notation $z_{\cal{C}}$ is more convenient for us than $z_{\alpha/2}$ because we will use the t Distribution Table in the Appendix to find $z_{\cal{C}}$ very quickly. We could equally well write \[ \mu = \bar{x} \pm E \] but we will use Equation (8.1) because it explicitly gives the bounds for the confidence interval. Notice how the confidence interval is backwards from the picture that the central limit theorem gives, the picture shown in Figure 8.3. We actually had no business using the inverse $z$-transformation $\mu = (z - \bar{x})/(\sigma/\sqrt{n})$ to arrive at Figure 8.2. It reverses the roles of $\mu$ and $\bar{x}$. We'll return to this point after we work through the mechanics of an example. [caption id="attachment_1614" align="aligncenter" width="300"] Figure 8.3 : The central limit theorem is about distributions of sample means.[/caption] Example 8.2 : What is the 95$\%$ confidence interval for student age if the population $\sigma$ is 2 years, sample $n = 50$, $\bar{x} = 23.2$? Solution : So ${\cal{C}} = 0.95$. First write down the formula prescription so you can see with numbers you need: \[ \bar{x} - E < \mu < \bar{x} + E \mbox{\hspace{2em}where\hspace{2em}} E = z_{95\%} \frac{\sigma}{\sqrt{n}}. \] First determine $z_{\cal{C}} = z_{\alpha/2}$. With the tables in the Appendices, there are two ways to do this. The first way is to use the Standard Normal Distribution Table noting that we need the $z$ associated with a table area of $0.95/2 = 0.475$. Using the table backwards we find $z_{\cal{C}} = 1.96$. The second way, the recommended way especially during exams, is to use the t Distribution Table. Simply find the column for the 95$\%$ confidence level and read the $z$ from the last line of the table. We quickly find $z_{95\%} = 1.960$. Either way we now find \[ E = 1.96( \frac{2}{\sqrt{50}}) = 0.6\] so \begin{eqnarray*} \bar{x} - E &< \mu <& \bar{x} + E\\ 23.2 - 0.6 &< \mu <& 23.2 + 0.6 \\ 22.6 &< \mu <& 23.8 \end{eqnarray*} with 95$\%$ confidence.▢
]]>\[ \bar{x} - E < \mu < \bar{x} + E\]
where, now\[E = t_{\nu,\cal{C}} \left( \frac{s}{\sqrt{n}} \right).\]
With this new formula for $E$ we have replaced $\sigma$ with $s$ in comparison with the formula we used in Section 8.1: Confidence Intervals using the z-distribution and, of course, replaced $z_{\cal{C}}$ with $t_{\nu,\cal{C}}$. Some books use $t_{\nu,\cal{C}} = t_{\nu,\alpha/2}$ like the $z_{\cal{C}}$ of Section 8.1. We use $t_{\nu,\cal{C}}$ because we'll look up its value in the t Distribution Table in the column for $\cal{C}$ confidence intervals (just like we did with $z$) and with the degrees of freedom $\nu$ specifying the row. The formula for the degrees of freedom in this case is :\[\nu = n - 1.\]
The $t_{\nu,\cal{C}}$ specify a probability $\cal{C}$ as shown in Figure 8.6. As before, the inverse $z$-transform, in the form $x = t_{\nu,{\cal{C}}} s + \bar{x}$ from the $t$-distribution on the left of Figure 8.6 to the distribution on the right of Figure 8.6 leads to our confidence interval formula for small means. And as before we should justify using that transform from a Bayesian perspective. → inverse $z$-transform →Figure 8.6 : Derivation of confidence intervals for means of small samples.
Example 8.2 : Given the following data: \[5460 \hspace{.15in} 5900 \hspace{.15in} 6090 \hspace{.15in} 6310 \hspace{.15in} 7160 \hspace{.15in} 8440 \hspace{.15in} 9930\]find the 99% confidence interval for the mean.
Solution : First count $n = 7$ and then, with your stats calculator compute \[ \bar{x} = 7041.4 \hspace{.15in} \text{and} \hspace{.15in} s = 1610.3. \] Using the t Distribution Table with $\nu = n-1 = 6$ in the 99% confidence interval column, find \[ t_{n-1, {\cal{C}}} = t_{6,99\%} = 3.707. \] With these numbers, compute \[ E = t_{n-1,{\cal{C}}} \left( \frac{s}{\sqrt{n}} \right) = 3.707 \left( \frac{1610.3}{\sqrt{7}} \right) = 2256.2 \] so \begin{eqnarray*} \bar{x} - E \:\:< & \mu & < \:\:\bar{x} + E \\ 7041.4 - 2256.2\:\: < & \mu & < \:\: 7041.4 + 2256.2 \\ 4785.2\:\: < & \mu & < \:\:9297.6 \end{eqnarray*} is the 99$\%$ confidence interval for $\mu$.▢
]]>▢
Sample size need for a poll Measuring proportions is what pollsters do. For example in an election you might want to know how many people will vote for liberals (items of interest) and how many will vote for conservatives (items not of interest)[footnote]We assume here that there are only two parties. For the real life situation of more than two parties we need the multinomial distribution and to approximate it with a multivariate normal distribution. That is a topic for multivariate statistics but the principles are the same as what we cover here.[/footnote] In a news paper you might see: ``The poll says that 72$\%$ of the voters will vote liberal. The poll is considered accurate to 2 percentage points 19 time out of 20.'' This means that the 95$\%$ confidence interval (19/20 = 0.95) of the proportion of liberal voters is $0.72 \pm 0.02$ (note how proportions are presented as percentages in the newspaper). The error here is $E = 0.02$. Before the pollster starts telephoning people, she must know how many people to phone to arrive at that goal error of 2$\%$. She needs to know what the sample size $n$ needed is. In general, the minimum sample size needed to attain a goal error $E$ on a confidence interval of $\cal{C}$ is \[n = \hat{p}\hat{q}\left( \frac{z_{\cal{C}}}{E} \right)^{2}.\] Here $\hat{p}$ and $\hat{q}$ could come from a previous survey if available. If there is no such survey or if you want to be sure of ending up with an error equal to or less than a goal E, then use $\hat{p} = \hat{q} = 0.5$, see Figure 8.9. [caption id="attachment_1969" align="aligncenter" width="504"] Figure 8.9 : The formula $n = \hat{p}\hat{q}\left( \frac{z_{\cal{C}}}{E} \right)^{2}$ is a quadratic formula. Substitute $\hat{q} = 1 - \hat{p}$ to get $n = \hat{p}(1-\hat{p})\left( \frac{z_{\cal{C}}}{E} \right)^{2}$ or $n = (\hat{p} -\hat{p}^{2})\left( \frac{z_{\cal{C}}}{E} \right)^{2}$. The maximum of $n_{\rm max} = \frac{1}{4}\left( \frac{z_{\cal{C}}}{E} \right)^{2}$ is at $\hat{p} = 0.5$.[/caption] Example 8.4 : We want to estimate, with 95$\%$ confidence, the proportion of people who own a home computer. A previous study gave an answer of 40$\%$. For a new study we want an error of 2$\%$. How many people should we poll? Solution : From the question we have : \[\hat{p}=0.40, \hspace{.25in} \hat{q}=0.60\] \[E = 0.02, \hspace{.25in} \alpha = 0.95\] From the t Distribution Table (or the Standard Normal Distribution Table if you think about the areas correctly) we find \[z_{\cal{C}} = z_{95\%} = 1.960.\] Therefore \[n = \hat{p}\hat{q}\left( \frac{z_{\alpha/2}}{E}\right)^2 = (0.40)(0.60)\left( \frac{1.96}{0.02}\right)^2 = 2304.96\] Which we round up to a sample size of 2305 to ensure that $E<0.02$.▢
]]>\[59, 54, 53, 52, 51, 39, 49, 46, 49, 48\]
Solution : Compute, using your calculator :\[s^2 = 28.2\] \[\nu =n-1 = 9.\] From the Chi-squared Distribution Table, in the $\nu = 9$ line, find : \[\chi^2_{\rm right} = \chi^2 \left( \frac{1-0.90}{2}\right) = \chi^2(0.05) = 16.919\] and \[\chi^2_{\rm left} = \chi^2 (1-0.05) = \chi^2(0.95) = 3.325\] So \begin{align*} \frac{(n-1)s^2}{\chi^2_{\rm right}} &< \sigma^{2} < \frac{(n-1)s^2}{\chi^2_{\rm left}}\\ \frac{9 \cdot 28.2}{16.919} &< \sigma^{2} < \frac{9 \cdot 28.2}{3.325}\\ 15.0 &< \sigma^2 < 76.3 \hspace{1in} \mbox{with 90\% confidence.} \end{align*} Taking square roots: \[3.87 < \sigma < 8.73 \hspace{1in} \mbox{with 90\% confidence.}\]▢
]]>Two-Tailed Test | Right-Tailed Test | Left-Tailed Test |
$H_{0}$ : $\mu = k$ | $H_{0}$ : $\mu \leq k$ | $H_{0}$ : $\mu \geq k$ |
$H_{1}$ : $\mu \neq k$ | $H_{1}$ : $\mu > k$ | $H_{1}$ : $\mu < k$ |
$H_{0} :\mu \leq 42,000$
$H_{1} : \mu > 42,000$ (claim)
(This is a right-tailed test.)
2. Critical Statistic.Method (b) is the recommended method not only because it is faster but also because the procedure for the upcoming $t$-test will be the same for the $z$-test.
3. Test Statistic.\[z_{\rm test} = \frac{\bar{x} - k}{\left( \frac{\sigma}{\sqrt{n}}\right)} = \frac{43260 - 42000}{\left( \frac{5230}{\sqrt{30}}\right)} = 1.32\]
4. Decision.Draw a picture so you can see the critical region :
So $z$ is in the non-critical region: Do not reject $H_{0}$.
5. Interpretation.There is not enough evidence, from a $z$-test at $\alpha = 0.05$, to support the claim that professors earn more than \$42,000/year on average.
▢
So where does Equation (9.1) come from? It's an application of the central limit theorem! In Example 9.2, $\bar{x} = 43,260$, $n = 30$, $\sigma = 5230$ and $k = 42,000$ on the null hypothesis of a right-tailed test. The central limit theorem says that if $H_{0}$ is true then we can expect the sample means, $\bar{x}$ to be distributed as shown in the top part of Figure 9.1. Setting $\alpha = 0.05$ means that if the actual sample mean, $\bar{x}$ ends up in the tail of the expected (under $H_{0}$) distribution of sample means then we consider that either we picked an unlucky 5$\%$ sample or the null hypothesis, $H_{0}$, is not true. In taking that second option, rejecting $H_{0}$, we are willing to live with the 0.05 probability that we made a wrong choice -- that we made a type I error. [caption id="attachment_1997" align="aligncenter" width="411"] Figure 9.1: Derivation of the $z$ test statistic.[/caption] Referring to Figure 9.1 again, $z_{\rm critical} = 1.645$ on the lower picture defines the critical region of area $\alpha = 0.05$ (in this case). It corresponds to a value $\bar{x}_{\rm critical}$ on the upper picture which also defines a critical region of area $\alpha = 0.05$. So comparing $\bar{x}$ to $\bar{x}_{\rm critical}$ on the original distribution of sample means, as given by the sampling theory of the central limit theorem, is equivalent, after $z$-transformation, to comparing $z_{\rm test}$ with $z_{\rm critical}$. That is, $z_{\rm test}$ is the $z$-transform of the data value $\bar{x}$, exactly as given by Equation (9.1). One-tailed tests From a frequentist point of view, a one-tailed test is a a bit of a cheat. You use a one-tailed test when you know for sure that your test value or statistic is greater than (or less than) the null hypothesis value. That is, for the case of means here, you know for sure that the mean of the population, if it is different from the null hypothesis mean, if greater than (or less than) the null hypothesis mean. In other words, you need some a priori information (a Bayesian concept) before you do the formal hypothesis test. In the examples that we will work through in this course, we will consider one-tailed tests when they make logical sense and will not require formal a priori information to justify the selection of a one-tailed test. For a one-tail test to make logical sense, the alternate hypothesis, $H_{1}$, must be true on the face value of the data. That is, if we substitute the value of $\bar{x}$ for $\mu$ into the statement of $H_{0}$ (for the test of means) then it should be a true statement. Otherwise, $H_{1}$ is blatantly false and there is no need to do any statistical testing. In any statistical test, $H_{1}$ must be true at face value and we do the test to see if $H_{1}$ is statistically true. Another way tho think about this is to think of $\bar{x}$ as a fuzzy number. As a sharp number a statement like "$\bar{x} > k$" may be true, but $\bar{x}$ is fuzzy because of $s$ (think $\bar{x} = \bar{x} \pm s$ to get the fuzzy number idea). So "$\bar{x} > k$" may not be true when $\bar{x}$ is considered to be a fuzzy number[footnote]Fuzzy numbers can be treated rigorously in a mathematical sense. See, e.g. Kaufmann A, Gupta MM, Introduction to fuzzy arithmetic: theory and applications, Van Nostrand Reinhold Co., 1991.[/footnote] When we make our decision (step 4) we consider the equality part of the $H_{0}$ statement in one-tailed tests. This equality is the strict $H_{0}$ under all circumstances but we use $\geq$ or $\leq$ is $H_{0}$ statements simply because they are the logical opposite of $<$ or $>$ in the $H_{1}$ statements. So people may have an issue with this statement of $H_{0}$ but we will keep it because of the logical completeness of the $H_{0}$, $H_{1}$ pair and the fact that hypothesis testing is about choosing between two well-defined alternatives. p-Value The critical statistic defines an area, a probability, $\alpha$ that is the maximum probability that we are willing to live with for making a type I error of incorrectly rejecting $H_{0}$. The test statistic also defines an analogous area, called $p$ or the $p$-value or (by SPSS especially) the significance. The $p$-value represents the best guess from the data that you will make a type I error if you reject $H_{0}$. Computer programs compute $p$-values using CDFs. So when you use a computer (like SPSS) you don't need (or usually have) the critical statistic and you will make your decision (step 4) using the $p$-value associated with the test statistic according to the rule:\[ \mbox{If } p\leq \alpha \mbox{ reject } H_{0}.\]
\[ \mbox{If } p> \alpha \mbox{ do not reject } H_{0}.\]
The method of comparing test and critical statistics is the traditional approach, popular before computers because is is less work to compute the two statistics than it is to compute $p$. When we work problem by hand we will use the traditional approach. When we use SPSS we will look at the $p$-value to make our decision. To connect the two approaches pedagogically we will estimate the $p$-value by hand for a while. Example 9.3 : Compute the $p$-value for $z{\rm test} = 1.32$ of Example 9.2. Solution : This calculation can happen as soon as you have the test statistic in step 3. The first thing to do is to sketch a picture of the $p$-value so that you know what you are doing, see Figure 9.2. [caption id="attachment_1999" align="aligncenter" width="375"] Figure 9.2 : The $p$-value associated with $z_{\rm test} = 1.32$ in a one-tail test.[/caption] Using the Standard Normal Distribution Table to find the tail area associated with $z{\rm test} = 1.32$, we compute : \begin{eqnarray*} p(z_{\rm test}) & = & 0.5 - A(z_{\rm test}) \\ &=& 0.5 - 0.4066 = 0.034 \end{eqnarray*} That is $p = 0.0934$. Since $(p = 0.0934) > (\alpha = 0.05)$, we do not reject $H_{0}$ in our decision step (step 4).▢
When using the Standard Normal Distribution Table to find $p$-values for a given $z$ you compute).Two-Tailed Test | Right-Tailed Test | Left-Tailed Test |
$H_{0}$ : $\mu = k$ | $H_{0}$ : $\mu \leq k$ | $H_{0}$ : $\mu \geq k$ |
$H_{1}$ : $\mu \neq k$ | $H_{1}$ : $\mu > k$ | $H_{1}$ : $\mu < k$ |
$H_{0} : \mu \leq 36.7$
$H_{1} : \mu > 36.7$ (claim)
2. Critical statistic.In the t Distribution Table, find the column for one-tailed test at $\alpha = 0.05$ and the line for degrees of freedom $\nu = n-1 = 14$. With that find
\[t_{\rm critical} = 1.761\]
3. Test statistic.To compute this we need : $\bar{x} = 40.6$, $s= 6$ and $n = 15$ from the problem statement. From the hypothesis we have $k = 36.7$. So
\[ t_{\rm test} = \frac{40.6 - 36.7}{\left( 6/ \sqrt{15}\right)} = 2.517 \]
At this point we can estimate the $p$-value using the t Distribution Table, which doesn't have as much information about the $t$-distribution as the Standard Normal Distribution Table has about the $z$-distribution, so we can only estimate. The procedure is: In the $\nu = 14$ row, look for $t$ values that bracket $t _{\rm test} = 2.517$. They are 2.145 (with $\alpha = 0.025$ in the column heading for one-tailed tests) and 2.624 (associated with a one-tail $\alpha = 0.01$).
So,
\[0.010 < p < 0.025\]
is our estimate[footnote]If you know how to interpolate then you can find a single value for $p$.[/footnote] for $p$.
4. Decision.