Stat 412 — Weeks 5 and 6

Xiongzhi Chen

Washington State University

Fall 2017

Independent samples: exploration

Exmaple 6.1

Comparing the potency of a particular drug: Fresh versus Stored

Data:

> # potency readings for Fresh drug
> cFresh = c(10.2, 10.5, 10.3, 10.8, 9.8, 10.6, 10.7, 10.2, 10, 
+     10.6)
> 
> # potency readings for Stored drug
> cStored = c(9.8, 9.6, 10.1, 10.2, 10.1, 9.7, 9.5, 9.6, 9.8, 9.9)
> 
> # combine them into data frame
> Length = as.data.frame(cbind(cFresh, cStored))
> colnames(Length) = c("Fresh", "Stored")

Exmaple 6.1 (cont’d)

How to access if the mean potency of Fresh drug is different than that of Stored drug?

  • how to obtain information on the two means?
  • how to summarize information on the two means?
  • what is a test statistic for this task?

Exmaple 6.1 (cont’d)

Exmaple 6.1 (cont’d)

Exmaple 6.1 (cont’d)

Standard deviations for each sample

> # std dev for potency of Fresh drug
> sd(cFresh)
[1] 0.3233505
> 
> # std dev for potency of Stored drug
> sd(cStored)
[1] 0.2406011

Question: the population standard deviations are the same?

Exmaple 6.1 (cont’d)

Strategy to access the difference between two means:

  • obtain sample means from the two samples
  • take the difference between the two sample means
  • normalize the difference (difficult!)
  • assess the normalized difference statistically

Exmaple 6.1 (cont’d)

> mean(cFresh)
[1] 10.37
> mean(cStored)
[1] 9.83
> mean(cFresh) - mean(cStored)
[1] 0.54
> 
> # sample standard deviation of differences
> sd(cFresh - cStored)
[1] 0.4325634

Exmaple 6.1 (cont’d)

Independent samples: Extra

Theorem 6.1 in Text (Extra)

If \(y_1 \sim \mathsf{N}(\mu_1,\sigma_1^2)\), and \(y_2 \sim \mathsf{N}(\mu_2,\sigma_2^2)\) and they are independent, then the difference \(y_1 - y_2\) follows

\[\mathsf{N}(\mu_1 - \mu_2, \sigma_1^2 + \sigma_2^2)\]

Similarly, the sum \(y_1 + y_2\) follows

\[\mathsf{N}(\mu_1 + \mu_2, \sigma_1^2 + \sigma_2^2)\]

Note the variance term in the above

Example 6.1: (cont’d) (Extra)

  • Sample for Fresh drug follow \(\mathsf{N}(\mu_1,\sigma_1^2)\)
  • Sample for Stored drug follow \(\mathsf{N}(\mu_2,\sigma_2^2)\)

  • Sample mean for Fresh drug: \[\bar{y}_1 \sim \mathsf{N}(\mu_1,\sigma_1^2/n)\]

  • Sample mean for Stored drug \[\bar{y}_2 \sim \mathsf{N}(\mu_2,\sigma_2^2/n)\]

Example 6.1: (cont’d) (Extra)

  • Further \(d = \bar{y}_1 - \bar{y}_2\) follows \[\mathsf{N}\left(\mu_1 - \mu_2,\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}\right)\]

  • Assume \(\sigma_1 = \sigma_2\) and set \[s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2 -1)s_2^2}{n_1 + n_2 -2}}\] then \(T = \frac{d}{s_{p}}\) follows a Student’s t distribution with degrees of freedom (df) \(n_1 + n_2 -2\)

Indepdent samples: inference

Exmaple 6.1 (cont’d)

Assumptions:

  • Random sample for Fresh drug follow \(\mathsf{N}(\mu_1,\sigma_1^2)\)
  • Random sample for Stored drug follow \(\mathsf{N}(\mu_2,\sigma_2^2)\)
  • Two samples are independent
  • Assume \(\sigma_1 = \sigma_2\)

Exmaple 6.1 (cont’d)

Exmaple 6.1 (cont’d)

The \(100(1- \alpha)\%\) confidence interval (CI) for the difference \(\mu_1 - \mu_2\) is constructed as follows:

  • Difference between sample means \(d = \bar{y}_1 - \bar{y}_2\)

  • Set \(s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2 -1)s_2^2}{n_1 + n_2 -2}}\)
  • CI: \[ (\bar{y}_1 - \bar{y}_2) \pm t_{\alpha/2} s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}} \] where \(t_{\alpha/2}\) is the \((1-\alpha/2)\%\) quantile of a Student’s t distribution with \(df = n_1+n_2-2\)

Exmaple 6.1 (cont’d)

Illustration of CI: the \(95\%\) CI is \([0.2722297,0.8077703]\)

> n1 = length(cFresh)
> n2 = length(cStored)
> d = mean(cFresh) - mean(cStored)
> sp = sqrt(((n1 - 1) * (sd(cFresh))^2 + (n2 - 1) * (sd(cStored))^2)/(n1 + 
+     n2 - 2))
> cval = qt(0.05/2, df = n1 + n2 - 2, ncp = 0, lower.tail = FALSE)
> cval
[1] 2.100922
> nr = sqrt(1/n1 + 1/n2)
> CI_left = d - cval * sp * nr
> CI_left
[1] 0.2722297
> CI_right = d + cval * sp * nr
> CI_right
[1] 0.8077703

Exmaple 6.1 (cont’d)

Illustration of CI: the \(95\%\) CI is \([0.2722297,0.8077703]\)

> tTest = t.test(x = cFresh, y = cStored, alternative = "two.sided", 
+     mu = 0, paired = FALSE, var.equal = TRUE, conf.level = 0.95)
> tTest$conf.int
[1] 0.2722297 0.8077703
attr(,"conf.level")
[1] 0.95

Exmaple 6.1 (cont’d)

Hypothesis testing when \(D_0\) is a postulated value related to \(\mu_1 - \mu_2\):

  • Recall \(d = \bar{y}_1 - \bar{y}_2\) and \[s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2 -1)s_2^2}{n_1 + n_2 -2}}\]

  • Test statistic: \(T = \frac{d - D_0}{s_{p} \sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}\) follows a Student’s t distribution with \(df= n_1 + n_2 - 2\)

Exmaple 6.1 (cont’d)

Illustration: \(H_0: \mu_1 - \mu_2 = 0\) vs \(H_0: \mu_1 - \mu_2 \ne 0\)

  • \(D_0 = 0\) and \(t_{0.05/2,18} = 2.100922\)
  • Reject \(H_0\) if \(|T| \ge t_{\alpha/2,n_1 + n_2 -2}\)
> t = (d - 0)/(sp * nr)
> t  #value of test stat
[1] 4.236833
> cval  #critical value
[1] 2.100922
> t < cval
[1] FALSE
> pval = 2 * pt(t, df = n1 + n2 - 2, ncp = 0, lower.tail = FALSE)
> pval
[1] 0.0004959478

Exmaple 6.1 (cont’d)

Test \(H_0: \mu_1 - \mu_2 = 0\) vs \(H_0: \mu_1 - \mu_2 \ne 0\)

  • \(D_0 = 0\) and \(t_{0.05/2,18} = 2.100922\)
  • Reject \(H_0\) if \(|T| \ge t_{\alpha/2,n_1 + n_2 -2}\)
> t.test(x = cFresh, y = cStored, alternative = "two.sided", mu = 0, 
+     paired = FALSE, var.equal = TRUE, conf.level = 0.98)

    Two Sample t-test

data:  cFresh and cStored
t = 4.2368, df = 18, p-value = 0.0004959
alternative hypothesis: true difference in means is not equal to 0
98 percent confidence interval:
 0.2146898 0.8653102
sample estimates:
mean of x mean of y 
    10.37      9.83 

Exmaple 6.1 (cont’d)

Test \(H_0: \mu_1 - \mu_2 \le 0.1\) vs \(H_0: \mu_1 - \mu_2 > 0.1\)

  • \(D_0 = 0.1\) and \(t_{0.05,18} = 1.734064\)
  • Reject \(H_0\) if \(T \ge t_{\alpha,n_1 + n_2 -2}\)
> t.test(x = cFresh, y = cStored, alternative = "greater", mu = 0.1, 
+     paired = FALSE, var.equal = TRUE, conf.level = 0.99)

    Two Sample t-test

data:  cFresh and cStored
t = 3.4522, df = 18, p-value = 0.001421
alternative hypothesis: true difference in means is greater than 0.1
99 percent confidence interval:
 0.2146898       Inf
sample estimates:
mean of x mean of y 
    10.37      9.83 

Exmaple 6.1 (cont’d)

Test \(H_0: \mu_1 - \mu_2 \ge 0.5\) vs \(H_0: \mu_1 - \mu_2 < 0.5\)

  • \(D_0 = 0.5\) and \(- t_{0.05,18} = -1.734064\)
  • Reject \(H_0\) if \(T \le - t_{\alpha,n_1 + n_2 -2}\)
> t.test(x = cFresh, y = cStored, alternative = "less", mu = 0.5, 
+     paired = FALSE, var.equal = TRUE, conf.level = 0.96)

    Two Sample t-test

data:  cFresh and cStored
t = 0.31384, df = 18, p-value = 0.6214
alternative hypothesis: true difference in means is less than 0.5
96 percent confidence interval:
      -Inf 0.7764699
sample estimates:
mean of x mean of y 
    10.37      9.83 

Exmaple 6.1 (Recap)

Exmaple 6.1 (Recap)

Independent samples: unequal variances

Exmaple 6.1

Recall the following:

  • Fresh drug: \(n_1 = 10\), \(\bar{y}_1=10.37\), \(s_1 = 0.3233\)
  • Stored drug: \(n_2 = 10\), \(\bar{y}_2=9.83\), \(s_2 = 0.2406\)
  • Assume unequal variances, i.e., \(\sigma_1^2 \ne \sigma_2^2\)

Confidence Interval

The \(100(1- \alpha)\%\) confidence interval (CI) for the difference \(\mu_1 - \mu_2\) is constructed as follows:

  • Recall \(d = \bar{y}_1 - \bar{y}_2\) and set \(\tilde{s}_p = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\)
  • Test statistic: \(\tilde{T} = \frac{d - (\mu_1 - \mu_2)}{\tilde{s}_p}\) is approximated by a Student’s t distribution with df given earlier
  • CI: \[(\bar{y}_1 - \bar{y}_2) \pm t_{\alpha/2,\textrm{df}}\tilde{s}_p\]

Exmaple 6.1 (cont’d)

Illustration on CI

  • \(d = \bar{y}_1 - \bar{y}_2 = 10.37 - 9.83 = 0.54\)
  • \(\tilde{s}_p = 0.1274\)
  • \(df = 16.62774\)
  • \(t_{0.025,17} = 2.11\); T table

CI: \(0.54 \pm 2.11 \times 0.1274\), i.e., \([0.271, 0.808]\)

Hypothesis testing

Testing when \(D_0\) is a postulated value related to \(\mu_1 - \mu_2\):

  • Recall \(d = \bar{y}_1 - \bar{y}_2\) and set \(\tilde{s}_p = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\)
  • Test statistic: \(\tilde{T} = \frac{d - D_0}{\tilde{s}_p}\) approximated by a Student’s t distribution with \[df= \frac{(n_1-1)(n_2-1)}{(1-c)^2(n_1-1)+c^2(n_2-1)}\] and \(c = \frac{s_1^2/n_1}{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}\)
  • Note: round df to the nearest integer

Exmaple 6.1: unequal variances

At Type I error probability \(\alpha=0.05\)

  • Test \(H_0: \mu_1 - \mu_2 \le 0.2\) vs \(H_a: \mu_1 - \mu_2 > 0.2\)

  • \(D_0 = 0.2\)
  • Reject \(H_0\) if \(\tilde{T} \ge t_{\alpha,df}\)

Exmaple 6.1: unequal variances

Obtain value of test statistic

  • \(\tilde{s}_p = 0.1274\)
  • \(df = 16.62774\)
  • \(t = \frac{10.37 - 9.83 - 0.2}{0.1274} = 2.668\)
  • \(t_{0.05,17} = 1.74185\); T table

Compare \(t\) with \(1.74185\)

Independent samples with unequal variances: practice

Exmaple 6.1: unequal variances

At Type I error probability \(\alpha=0.05\)

  • Test \(H_0: \mu_1 - \mu_2 = D_0\) vs \(H_a: \mu_1 - \mu_2 \ne D_0\)

  • \(D_0 = 0\) and \(t_{0.025,17} = 2.11\)
  • Reject \(H_0\) if \(|\tilde{T}| \ge t_{\alpha/2,df}\)

Exmaple 6.1: unequal variances

At Type I error probability \(\alpha=0.05\)

  • Test \(H_0: \mu_1 - \mu_2 \ge D_0\) vs \(H_0: \mu_1 - \mu_2 < D_0\)

  • \(D_0 = 0.3\) and \(t_{0.05,17} = 1.74185\)
  • Reject \(H_0\) if \(\tilde{T} \le - t_{\alpha,df}\)

Independent sample with unequal variances: lab

Exmaple 6.1: unequal variances

> potency = read.table("http://math.wsu.edu/faculty/xchen/stat412/data/t_uev.txt", 
+     sep = "\t", header = TRUE)
> class(potency)
[1] "data.frame"
> cFresh = potency$Fresh
> cFresh
 [1] 10.2 10.5 10.3 10.8  9.8 10.6 10.7 10.2 10.0 10.6
> cStored = potency$Stored
> cStored
 [1]  9.8  9.6 10.1 10.2 10.1  9.7  9.5  9.6  9.8  9.9

Exmaple 6.1 (Recap)

Exmaple 6.1: unequal variances

At Type I error probability \(\alpha=0.05\)

  • Test \(H_0: \mu_1 - \mu_2 = 0\) vs \(H_0: \mu_1 - \mu_2 \ne 0\)
> # compute critical value
> qt(0.05/2, df = 17, ncp = 0, lower.tail = FALSE)
[1] 2.109816
> # perform hypothesis testing
> t.test(x = cFresh, y = cStored, alternative = "two.sided", mu = 0, 
+     paired = FALSE, var.equal = FALSE, conf.level = 0.95)

    Welch Two Sample t-test

data:  cFresh and cStored
t = 4.2368, df = 16.628, p-value = 0.000581
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.2706369 0.8093631
sample estimates:
mean of x mean of y 
    10.37      9.83 

Exmaple 6.1 (cont’d)

At Type I error probability \(\alpha=0.02\)

  • Test \(H_0: \mu_1 - \mu_2 \le 0.2\) vs \(H_0: \mu_1 - \mu_2 > 0.2\)
> # compute critical value
> qt(0.02, df = 17, ncp = 0, lower.tail = FALSE)
[1] 2.223845
> # perform hypothesis testing
> t.test(x = cFresh, y = cStored, alternative = "greater", mu = 0.2, 
+     paired = FALSE, var.equal = FALSE, conf.level = 0.99)

    Welch Two Sample t-test

data:  cFresh and cStored
t = 2.6676, df = 16.628, p-value = 0.008227
alternative hypothesis: true difference in means is greater than 0.2
99 percent confidence interval:
 0.2120818       Inf
sample estimates:
mean of x mean of y 
    10.37      9.83 

Paired sample: exploration

Example 6.7

Are average estimates for repair costs from Garage I different than Garage II?

> # estimates of cost from Garage I
> GarageI = c(17.6, 20.2, 19.5, 11.3, 13, 16.3, 15.3, 16.2, 12.2, 
+     14.8, 21.3, 22.1, 16.9, 17.6, 18.4)
> 
> # estimates of cost from Garage I
> GarageII = c(17.3, 19.1, 18.4, 11.5, 12.7, 15.8, 14.9, 15.3, 
+     12, 14.2, 21, 21, 16.1, 16.7, 17.5)

Example 6.7

Example 6.7

Boxplot of difference between estimates

Exmaple 6.7

Example 6.7

Test \(H_0: \mu_1 - \mu_2 = 0\) vs \(H_a: \mu_1 - \mu_2 \ne 0\)

  • Try t test based on independent samples
  • test statistic value: 0.54616
  • degrees of freedom: 27.797
  • p-value of test: 0.5893

Conclusion …

Example 6.7

Histogram for differences between the estimates

Example 6.7

What is wrong with applying t test based on independent samples to this data set?

  • Are the two samples independent?
  • For each car, are the two estimates for it independent?
  • Normality violated?

Paired sample: inference

Construct test statistic

On observations:

  • Sample 1: \(y_{1i}, i=1,\ldots,n\)
  • Sample 2: \(y_{2i}, i=1,\ldots,n\)
  • Differences: \(d_{i}= y_{1i} - y_{2i}, i=1,\ldots,n\)

Requirements:

  • Sampling distribution of \(d_{i}\)’s is Normal
  • The \(d_{i}\)’s are independent

Construct test statistic

  • Obtain \(\bar{d}\) and \(s_d\), the sample mean and stardand deviation of \(d_{i}\)’s
  • \(D_0\): a specified value on \(\mu_d = \mu_1 - \mu_2\)

  • Test statistic: \(T = \frac{\bar{d} - D_0}{s_d/\sqrt{n}}\) follows a t distribution with \(\textrm{df}=n-1\)

Exmaple 6.7

Check Normality on differences

p-value of KS test: 0.8008

Exmaple 6.7: hypothesis testing

  • Recall \(\mu_d = \mu_1 - \mu_2\)
  • Assess \(H_0: \mu_d \le 0\) vs \(H_a: \mu_d >0\)

  • Observed T.S. value: \(t = \frac{0.613-0}{0.394/\sqrt{15}} = 6.026\)
  • Critical value at Type I error probability \(\alpha = 0.05\) is \(t_{0.05,14}=1.761\); T table
  • Reject \(H_0\) if \(T \ge t_{\alpha,n-1}\)

Exmaple 6.7: confidence interval

The \((1-\alpha)\%\) CI for \(\mu_d = \mu_1 - \mu_2\) is \[\bar{d} \pm t_{\alpha/2,n-1}\frac{s_d}{\sqrt{n}}\]

Computing CI:

  • \(\bar{d} = 0.613\), \(s_d = 0.394\), \(t_{0.025,14}=2.14\)
  • 95% CI is: \(0.613 \pm 2.14\times \frac{0.394}{\sqrt{15}}\), i.e., \([0.395,0.831]\)

Paired sample: practice

Exercise 1

The \((1-\alpha)\%\) CI for \(\mu_d = \mu_1 - \mu_2\) is \[\bar{d} \pm t_{\alpha/2,n-1}\frac{s_d}{\sqrt{n}}\]

Construct 99% CI:

  • \(n = 12\), \(\bar{d} = 0.5\), \(s_d = 0.49\),
  • \(t_{0.01,14}\) T table

Exercise 2

  • Test statistic: \(T = \frac{\bar{d} - D_0}{s_d/\sqrt{n}}\) follows a t distribution with \(\textrm{df}=n-1\)
  • Recall \(\mu_d = \mu_1 - \mu_2\)
  • At Type I error prob \(\alpha = 0.05\), test \(H_0: \mu_d= 0.5\) vs \(H_a: \mu_d \ne 0.5\)

  • \(n = 16\), \(\bar{d} = 0.7\), \(s_d = 1\),
  • \(D_0 =0.5\) and \(t_{0.025,16}\); T table
  • Reject \(H_0\) when \(|T| > t_{\alpha/2,df}\)

Paired sample: lab

Exmaple 6.7

> RepairCost = read.table("http://math.wsu.edu/faculty/xchen/stat412/data/pairedT.txt", 
+     sep = "\t", header = T)
> RepairCost[1:3, ]
  GarageI GarageII
1    17.6     17.3
2    20.2     19.1
3    19.5     18.4
> GarageI = RepairCost$GarageI
> GarageII = RepairCost$GarageII

Exmaple 6.7: CI

Construct confidence interval

> t.test(GarageI, GarageII, alternative = "two.sided", mu = 0, 
+     paired = TRUE, var.equal = FALSE, conf.level = 0.95)

    Paired t-test

data:  GarageI and GarageII
t = 6.0234, df = 14, p-value = 3.126e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.3949412 0.8317254
sample estimates:
mean of the differences 
              0.6133333 

Exmaple 6.7: hypothesis testing

Test on difference \(\mu_d = \mu_1 - \mu_2\)

  • Test \(H_0: \mu_d \le 0\) vs \(H_a: \mu_d >0\)
  • Reject \(H_0\) if \(T \ge t_{\alpha,df}\)
> t.test(GarageI, GarageII, alternative = "greater", mu = 0, paired = TRUE, 
+     var.equal = FALSE, conf.level = 0.95)

    Paired t-test

data:  GarageI and GarageII
t = 6.0234, df = 14, p-value = 1.563e-05
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
 0.4339886       Inf
sample estimates:
mean of the differences 
              0.6133333 

Conclusion?

Exmaple 6.7: hypothesis testing

Test on difference \(\mu_d = \mu_1 - \mu_2\)

  • Test \(H_0: \mu_d = 0.4\) vs \(H_a: \mu_d \ne 0.4\)
  • Reject \(H_0\) if \(|T| \ge t_{\alpha/2,df}\)
> t.test(GarageI, GarageII, alternative = "two.sided", mu = 0.4, 
+     paired = TRUE, var.equal = FALSE, conf.level = 0.95)

    Paired t-test

data:  GarageI and GarageII
t = 2.0951, df = 14, p-value = 0.05483
alternative hypothesis: true difference in means is not equal to 0.4
95 percent confidence interval:
 0.3949412 0.8317254
sample estimates:
mean of the differences 
              0.6133333 

Conclusion?

Exmaple 6.7: hypothesis testing

Test on difference \(\mu_d = \mu_1 - \mu_2\)

  • Test \(H_0: \mu_d \ge 0.3\) vs \(H_a: \mu_d < 0.3\)
  • Reject \(H_0\) if \(T \le - t_{\alpha,df}\)
> t.test(GarageI, GarageII, alternative = "less", mu = 0.3, paired = TRUE, 
+     var.equal = FALSE, conf.level = 0.95)

    Paired t-test

data:  GarageI and GarageII
t = 3.0772, df = 14, p-value = 0.9959
alternative hypothesis: true difference in means is less than 0.3
95 percent confidence interval:
      -Inf 0.7926781
sample estimates:
mean of the differences 
              0.6133333 

Conclusion?

Paired sample: summary

Summary 1

  • The rejection regions are similar to those for t test based on independent samples
  • Each pair of observations are usually obtained from the same individual or item (i.e., from the same experimental unit)
  • For each experimental unit, the pair of observations are dependent
  • The pairs of observations are independent

Summary 2

  • Test is applied to differences obtained from the pairs
  • Test is essentially a one-sample t test where sample is the pairwise differences
  • Rejection regions are very similar to those of one-sample t test

Paired sample: other examples

  • Measurements on the same experimental unit obtained before and after receiving a treatement, e.g., in assessing drug effect
  • Measurements obtained on two experimental units with similar features, e.g., in assessing improvement of new teaching methods

Inference on two population variances

Motivations

  • Variation of potency for drug
  • Risks in portfolios
  • Assess equality of variances in two-sample test

Sample variance

  • Random sample \(y_1,\ldots,y_n\) with mean \(\mu\) and variance \(\sigma^2\)

  • Sample mean \(\bar{y}=\frac{1}{n}\sum_{i=1}^n y_i\)

  • Sample variance \(s^2 = \frac{1}{n-1}\sum_{i=1}^n (y_i - \bar{y})^2\)

Statistic

  • Random sample 1: \(y_{1i}, i=1,\ldots,n_1\) follow \(\mathsf{N}(\mu_1, \sigma_1^2)\)
  • Random sample 2: \(y_{21}, i=1,\ldots,n_2\) follow \(\mathsf{N}(\mu_2, \sigma_2^2)\)
  • Sample variance \(s_1^2\) for random sample 1; sample variance \(s_2^2\) for random sample 2

Then \[\frac{s^2/\sigma_1^2}{s_2^2/\sigma_2^2}\] follows an F distribution

Density of an F distribution

Density: \(df_1=3\) and \(df_2=5\)

F distribution

  • not symmetrical
  • with non-negative values
  • with \(df_1\) (for \(s_1^2\)) and \(df_2\) (for \(s_2^2\))

Hypothesis testing

Test statistic \(F= \frac{s_1^2}{s_2^2}\)

  • For \(H_0: \sigma_1^2 \le \sigma_2^2\) vs \(H_a: \sigma_1^2 > \sigma_2^2\), reject \(H_0\) if \(F \ge F_{\alpha,df_1,df_2}\)

  • For \(H_0: \sigma_1^2= \sigma_2^2\) vs \(H_a: \sigma_1^2 \ne \sigma_2^2\), reject \(H_0\) if \(F \ge F_{\alpha/2,df_1,df_2}\) or \(F \le F_{1-\alpha/2,df_1,df_2}\)

  • F table; \(F_{1-\alpha,df_1,df_2}= \frac{1}{F_{\alpha,df_2,df_1}}\)

Confidence interval

\(100(1-\alpha)\%\) confidence interval for \(\sigma_1^2/\sigma_2^2\) is constructed as follows:

  • obtain \(s_1^2\), \(s_2^2\) and \(s_1^2/s_2^2\)
  • obtain \(df_1 = n_1 -1\) and \(df_2 = n_2 -1\)
  • obtain \(F_U = F_{\alpha/2,df_2,df_1}\) and \(F_L = 1/F_{\alpha/2,df_1,df_2}\)

  • confidence interval: \(\left[\frac{s_1^2}{s_2^2}F_L,\frac{s_1^2}{s_2^2}F_U\right]\)

Example 6.1 (revist)

Data:

> # potency readings for Fresh drug
> cFresh = c(10.2, 10.5, 10.3, 10.8, 9.8, 10.6, 10.7, 10.2, 10, 
+     10.6)
> 
> # potency readings for Stored drug
> cStored = c(9.8, 9.6, 10.1, 10.2, 10.1, 9.7, 9.5, 9.6, 9.8, 9.9)

Example 6.1 (revist)

Nomality test

Example 6.1: testing variances

At Type I error probability \(\alpha = 0.05\), test \(H_0: \sigma_1^2= \sigma_2^2\) vs \(H_a: \sigma_1^2 \ne \sigma_2^2\)

  • Sample 1: Fresh; Sample 2: Stored
  • \(n_1 = 10\), \(n_2 = 10\); \(s_1^2 = 0.10\), \(s_2^2= 0.06\)
  • \(F = \frac{s_1^2}{s_2^2} = 1.67\)
  • \(F_{0.025,10,10} = 3.72\); \(F_{0.975,10,10} =1/3.72= 0.27\); F table

Reject \(H_0\) if \(F \ge F_{\alpha/2,df_1,df_2}\) or \(F \le F_{1-\alpha/2,df_1,df_2}\). Conclusion?

Example 6.1: CI for variances

\(95\%\) confidence interval for \(\sigma_1^2/\sigma_2^2\)

  • \(n_1 = 10\), \(n_2 = 10\)
  • \(s_1^2 = 0.10\), \(s_2^2= 0.06\) and \(\frac{s_1^2}{s_2^2} = 1.67\)
  • \(F_U = F_{0.025,10,10} = 3.72\); \(F_L = F_{0.975,10,10} = 0.27\)
  • CI: \([1.67\times 0.27, 1.67\times 3.72]\)

F table

Inference on two population variances: practice

Inference on two population variances: lab

Summary

Extras

Simple rules on homework

  • If you use software to obtain answers, please attach the codes
  • Please put codes close to their associated answers
  • Homework assignments will be announced via Blackboard

License and session Information

License

> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 15063)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] knitr_1.17

loaded via a namespace (and not attached):
 [1] backports_1.1.0 magrittr_1.5    rprojroot_1.2   formatR_1.5    
 [5] tools_3.3.0     htmltools_0.3.6 revealjs_0.9    yaml_2.1.14    
 [9] Rcpp_0.12.12    stringi_1.1.5   rmarkdown_1.6   stringr_1.2.0  
[13] digest_0.6.12   evaluate_0.10.1