**Statistics** is the science of collecting, analyzing and drawing conclusions from data, i.e., a science of **Learning from Data**

- Formulate the problem
- Design studies/experiments to collect data
- Analyze data and draw conclusions

The course will cover elementary techniques that are needed for each step involved in Learning from Data:

- Collecting data: surveys and experimental studies
- Summarizing and visualizing data: mean, histogram, etc
- Analyzing data: inference on means and variances, regression models, analysis of variance, etc

- Course website: http://math.wsu.edu/faculty/xchen/stat412.html
- Source files for the course at: http://math.wsu.edu/faculty/xchen/stat412/
- Data sets for textbook at: Link
**Blackboard: homework solutions, exams and their solutions, grades**

- R: https://cran.r-project.org/
- R learning materials: An Introduction to R
- R forums: R forum and StackExchange
- Rstudio (IDE for R): https://www.rstudio.com/
- R markdown package/help: http://rmarkdown.rstudio.com/
- SAS: WSU University Edition
- Help for SAS: SAS Documentation

- Population, sample, random sampling
- Factors, control/treatment, response/covariates/explanatory variables
- Observational study, experimental study

**Observational study**: the researcher**does not interfere**with the information generating process**Experimental study**: the researcher**manipulates explanatory variables**and records their effects on the**response variables**

- Problem: do students with higher ACT score have higher GPA?
- Data collected from University X and University Y

```
> gpaAct <- read.table("D:/Teaching/stat412/data/CH03PR03.txt",sep = "", header=FALSE)
> colnames(gpaAct) = c("GPA","ACT","IQ", "Rank") # assign names to columns
> class(gpaAct)
[1] "data.frame"
```

Show data:

```
> gpaAct[1:10,]
GPA ACT IQ Rank
1 3.897 21 122 99
2 3.885 14 132 71
3 3.778 28 119 95
4 2.540 22 99 75
5 3.028 21 131 46
6 3.865 31 139 77
7 2.962 32 113 85
8 3.961 27 136 99
9 0.500 29 75 13
10 3.178 26 106 97
```

Eyeball check: higher ACT score leads to higher GPA?

`> plot(gpaAct$ACT, gpaAct$GPA, xlab="ACT score", ylab="GPA")`

- Type of study: observational
- Source of data: survey
- Population: colleges or university students
- Sampled population: students in Universities X and Y
- Sample: GPAs and ACT scores obtained

Issues with collecting data using surveys:

- How to sample from population
- Nonresponse from participants
- Measurement problems

Problem: how does cutting scheme affect the life of a machine tool?

- 2 Cutting Speeds; 2 Tool Geometries; 2 Cutting Angles
- Coding: -1 or 1

```
> cutLifetime <- read.csv("D:/Teaching/stat412/data/MontegomeryChp6Prb1.csv", header=TRUE,sep=",")
> class(cutLifetime)
[1] "data.frame"
```

Show data:

```
> cutLifetime[1:10,]
Cutting.Speed Tool.Geometry Cutting.Angle Life.Hours
1 -1 -1 -1 22
2 -1 -1 -1 31
3 -1 -1 -1 25
4 1 -1 -1 32
5 1 -1 -1 43
6 1 -1 -1 29
7 -1 1 -1 35
8 -1 1 -1 34
9 -1 1 -1 50
10 1 1 -1 55
```

- Type of study: experimental
- Design: factorial, completely randomized
- Factors: Cutting Speeds, Tool Geometries, Cutting Angles
- Treatments: the 8 combinations for the factors
- Replication: 3 replicates for each treatment

- Experimental unit: machine tool
- Explanatory variables: the three factors
- Response variables: Life Hours

- Key ingredients of experimental design:
**Randomization**and**Replication** - Issues of experimental design: not always implementable, control of experimental errors

```
> PROC IMPORT OUT= cuttingeg2
+ DATAFILE= "D:\Teaching\stat412\data\MontegomeryChp6Prb1.csv"
+ DBMS=CSV REPLACE;
+ GETNAMES=YES;
+ DATAROW=2;
+ RUN;
+
+ proc PRINT data= cuttingeg2;
+ run;
+
```

Operations on numbers: `+ - * / ^`

\((6 + 3 \times 4) \div 2 - 9^3\)

```
> (6+3*4)/2- 9^3
[1] -720
```

Some atomic classes (or modes) of objects in R:

- character
- logical
- numeric (real number)

```
> x <- "stat412"; x = "stat412" # character
> x <- TRUE # logical
> x <- 3.14159 # numeric
```

Note: Anything typed after the # sign is not evaluated. The # sign allows you to add comments to your code.

When a complete expression is entered at the prompt, it is evaluated and the result of the evaluated expression is returned. The result may be auto-printed.

```
> x = 1
> x+2
[1] 3
> print(x)
[1] 1
> print(x+2)
[1] 3
```

There are many useful functions included in R. Here are some examples of built-in functions:

```
> x <- 2
> print(x)
[1] 2
> sqrt(x)
[1] 1.414214
> log(x)
[1] 0.6931472
> class(x)
[1] "numeric"
> is.vector(x)
[1] TRUE
```

You can open the help file for any function by typing `?`

with the functions name. Here is an example:

`> ?sqrt`

There’s also a function `help.search`

that can do general searches for help. You can learn about it by typing:

`> ?help.search`

It’s also useful to use Google: for example, “r help square root”. The R help files are also on the web.

In the previous examples, we used `x`

as our variable name. Do not use the following variable names, as they have special meanings in R:

`c, q, s, t, C, D, F, I, T`

When combining two words for a given variable, we recommend one of these options:

```
> my_variable <- 1
> myVariable <- 1
```

Variable names such as `my.variable`

are problematic because of the special use of “.” in R.

The vector is the most basic object in R. You can create vectors in a number of ways. For one homework problem, a vector needs to be created.

```
> x <- c(1.2, 5, -10, 20, 5)
> x
[1] 1.2 5.0 -10.0 20.0 5.0
> length(x)
[1] 5
> z <- seq(from=0, to=100, by=10)
> z
[1] 0 10 20 30 40 50 60 70 80 90 100
> length(z)
[1] 11
```

- A vector can only contain elements of a single class:

```
> x <- c(1.2, 5, -10, 20, 5)
> x
[1] 1.2 5.0 -10.0 20.0 5.0
> x[1] # the first entry of x
[1] 1.2
> x[c(2,5)] # the 2nd and 5th entries of x
[1] 5 5
> z <- c(x, TRUE, FALSE)
> z # the vector z contains all numeric entries
[1] 1.2 5.0 -10.0 20.0 5.0 1.0 0.0
```

Like vectors, matrices are objects that can contain elements of only one class.

```
> m <- matrix(1:6, nrow=2, ncol=3)
> m
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> m[1,2] # the (1,2) entry of m
[1] 3
> m[1,] # the 1st row of m
[1] 1 3 5
> m[,1] # the 1st column of m
[1] 1 2
> m[,c(1,3)] # the 1st and 3rd columns of m
[,1] [,2]
[1,] 1 5
[2,] 2 6
```

In statistics, factors encode categorical data.

```
> paint <- factor(c("red", "white", "blue", "blue", "red",
+ "red"))
> paint
[1] red white blue blue red red
Levels: blue red white
```

The data frame is **one of the most important objects in R**. Data sets very often come in tabular form of mixed classes, and data frames are constructed exactly for this.

```
> df <- data.frame(counting=1:3, char=c("a", "b", "c"),
+ logic=c(TRUE, FALSE, TRUE))
> df
counting char logic
1 1 a TRUE
2 2 b FALSE
3 3 c TRUE
>
> nrow(df)
[1] 3
> ncol(df)
[1] 3
```

```
> dim(df)
[1] 3 3
>
> names(df)
[1] "counting" "char" "logic"
> attributes(df) # give the class infor of an object
$names
[1] "counting" "char" "logic"
$row.names
[1] 1 2 3
$class
[1] "data.frame"
```

Names can be assigned to columns and rows of vectors, matrices, and data frames.

```
> # load data from txt file online and assign data to variable gpaAct
> gpaAct <- read.table("http://math.wsu.edu/faculty/xchen/stat412/data/CH03PR03.txt",
+ sep = "", header=FALSE)
> class(gpaAct) # by default gpaAct is a data.frame
[1] "data.frame"
>
> gpaAct[1:3,] # show 1st 3 rows of data
V1 V2 V3 V4
1 3.897 21 122 99
2 3.885 14 132 71
3 3.778 28 119 95
```

The 1st column contains GPA; 2nd ACT scores

```
> # assign names to columns of gpaAct
> colnames(gpaAct) = c("GPA","ACT","IQ", "Rank")
> # access GPA by gpaAct$GPA and assign it to a new variable gpa
> gpa = gpaAct$GPA
>
> gpa[1:5] # 1st 5 entries of gpa
[1] 3.897 3.885 3.778 2.540 3.028
> gpaAct[,1][1:5] # alternative to the above line
[1] 3.897 3.885 3.778 2.540 3.028
```

To install a package from CRAN at https://cran.r-project.org/:

`> install.packages("nameOfPackage")`

To load a package so that its internal functions can be used:

`> library("nameOfPackage")`

Very useful packages: ggplot2, dplyr

R Fiddle: http://www.r-fiddle.org

- Sample Mean: the arithmetic average, i.e., \[\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i\]
- Median: the value for which 50% of the measurements lie above it and 50% fall below it, both inclusive
- Mode(s): the most frequent or probable measurement(s)

- \(p\)th percentile: among ordered measurements, at most \(p\%\) of them are below it and at most \((100-p)\%\) are above it, both exclusive
- Quartiles \(Q1, Q2, Q3\): \(25\%\), \(50\%\), \(75\%\) percentiles
- Sample variance: \(s^2 = \frac{1}{n-1}\sum_{i=1}^n (x_i - \bar{x})^2\)
- Sample standard deviation: positive root of sample variance
- Skewness: how asymmetric a distribution is

- Pearson correlation \[r=\frac{1}{n-1}\sum_{i=1}^n \left(\frac{x_i - \bar{x}}{s_x}\right)\left(\frac{y_i - \bar{y}}{y_x}\right)\]
- Examples in text: page 109
- Wikipedia examples

- Summary statistics usually are not sufficient to determine the data generating process
- Summary statistics may not contain accurate information on the data generating process
- Evidence: T-rex or donut

Show data:

```
> gpaAct[1:10,]
GPA ACT IQ Rank
1 3.897 21 122 99
2 3.885 14 132 71
3 3.778 28 119 95
4 2.540 22 99 75
5 3.028 21 131 46
6 3.865 31 139 77
7 2.962 32 113 85
8 3.961 27 136 99
9 0.500 29 75 13
10 3.178 26 106 97
```

Default binning: Sturges

`> hist(gpaAct$GPA, main = "Histogram of GPA", xlab="GPA")`

```
> summary(gpaAct$GPA)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.500 2.689 3.078 3.074 3.593 4.000
> var(gpaAct$GPA)
[1] 0.4151719
> sd(gpaAct$GPA)
[1] 0.6443383
>
> Mode <- function(x) {
+ ux <- unique(x)
+ ux[which.max(tabulate(match(x, ux)))]
+ }
>
> Mode(gpaAct$GPA)
[1] 3.885
> median(gpaAct$GPA)
[1] 3.0775
```

Finer binning

```
> hist(gpaAct$GPA, breaks=20, main = "", xlab="GPA")
> abline(v = mean(gpaAct$GPA), col = "blue", lwd = 2)
```

```
> plot(sort(gpaAct$GPA), ylab="GPA")
> abline(h = median(gpaAct$GPA), col = "blue", lwd = 2)
> abline(h = Mode(gpaAct$GPA), col = "red", lwd = 2, lty=2)
```

Note: 0.5 (an outlier)

`> boxplot(gpaAct$GPA)`

Frequency table of ACT score

```
> ftable(gpaAct$ACT)
14 15 16 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
1 1 3 7 3 10 9 4 5 12 10 10 10 10 7 7 4 4 1 1 1
```

Note: to create a pie chart

`> pie(gpaAct$ACT)`

`> plot(gpaAct$ACT, gpaAct$GPA, xlab="ACT score", ylab="GPA")`

Correlation between GAP and ACT score

```
> cor(gpaAct$ACT, gpaAct$GPA, method="pearson")
[1] 0.2694818
```

```
> cutLifetime[1:10,]
Cutting.Speed Tool.Geometry Cutting.Angle Life.Hours
1 -1 -1 -1 22
2 -1 -1 -1 31
3 -1 -1 -1 25
4 1 -1 -1 32
5 1 -1 -1 43
6 1 -1 -1 29
7 -1 1 -1 35
8 -1 1 -1 34
9 -1 1 -1 50
10 1 1 -1 55
```

- Cutting.Speed: with 2 levels
- Tool.Geometries: with 2 levels
- Cutting.Angles: with 2 levels

```
> cutLifetime$Cutting.Speed = factor(cutLifetime$Cutting.Speed, labels = c("SpeedA","SpeedB"))
> cutLifetime$Tool.Geometry = factor(cutLifetime$Tool.Geometry, labels=c("GeometryA","GeometryB"))
> cutLifetime$Cutting.Angle = factor(cutLifetime$Cutting.Angle,labels=c("AngleA","AngleB"))
```

```
> library(ggplot2)
> ggplot(cutLifetime, aes(x=Cutting.Speed,y=Life.Hours)) + stat_summary(fun.y="mean", geom="bar")+
+ facet_grid(Tool.Geometry~Cutting.Angle)+ylab("Mean of Life.Hours")+ theme_bw()
```

```
> library(dplyr)
> cutLifetimeSummary = cutLifetime %>% group_by(Cutting.Speed,Tool.Geometry,Cutting.Angle) %>%
+ summarize(stdDev = sd(Life.Hours), Means = mean(Life.Hours))
> cutLifetimeSummary
Source: local data frame [8 x 5]
Groups: Cutting.Speed, Tool.Geometry
Cutting.Speed Tool.Geometry Cutting.Angle stdDev Means
1 SpeedA GeometryA AngleA 4.582576 26.00000
2 SpeedA GeometryA AngleB 3.785939 42.33333
3 SpeedA GeometryB AngleA 8.962886 39.66667
4 SpeedA GeometryB AngleB 5.033223 54.66667
5 SpeedB GeometryA AngleA 7.371115 34.66667
6 SpeedB GeometryA AngleB 2.081666 37.66667
7 SpeedB GeometryB AngleA 4.932883 49.33333
8 SpeedB GeometryB AngleB 4.163332 42.33333
```

```
> library(ggplot2)
> ggplot(data = cutLifetime) + geom_boxplot(aes(x=Cutting.Speed,y=Life.Hours))+
+ facet_grid(Tool.Geometry~Cutting.Angle) + theme_bw()
```

Question: two modes?

`> hist(cutLifetime$Life.Hours,breaks=25,main="",xlab="Life.Hours")`

- Conditional probability: \(P(A|B) = \frac{P(A \cap B)}{P(B)}\) (what if \(P(B)=0\)?)
- Independence: \(P(A \cap B) = P(A)P(B)\) iff events \(A\) and \(B\) independent
- \(P(A|B) = P(A)\) iff \(P(A \cap B) = P(A)P(B)\)?

- Discrete rv: taking at most countably many values via \[P(X=a_i)=c_i\]
- Number of heads in coin tosses (Binomial)
- Number of customer visits to a service center in 10 minutes (Poisson)

- Continuous rv: taking a continuum of values governed by a law \[F(x)=\int_{-\infty}^x f(x)dx\] with a density \(f\)
- measuring temperature; measuring profit

```
> curve(dnorm(x,0,1), xlim=c(-3,3), main='Standard Normal density',
+ xlab = expression(x),ylab=expression(f(x)))
```

- Symmetric around its mean
- Bell shaped
- A dominant proportion of Normal data are around the mean

```
> set.seed(123)
> data = rnorm(5000,mean=0,sd=1)
> hist(data, prob=TRUE) # obtain histogram and save prob
> lines(density(data)) # fit a density for data
```

- Notation: \(Y\) follows a Normal distribution with mean \(\mu\) and variance \(\sigma^2\), i.e., \(Y \sim \mathsf{N}(\mu,\sigma^2)\)
- Density: \(f(y) = \frac{1}{\sqrt{2 \pi \sigma}} \exp{\left[-\frac{(y -\mu)^2}{2 \sigma^2}\right]}\)
- If \(Y\) follows \(\mathsf{N}(\mu,\sigma^2)\), then \(Z = \frac{Y -\mu}{\sigma}\) follows \(\mathsf{N}(0,1)\)
- The Normal density is very, very special

- Table 1 in textbook
- \(z = \frac{y-\mu}{\sigma}\) is referred to as \(z\)-score when \(Y=y\) and \(Y \sim \mathsf{N}(\mu,\sigma^2)\)

\(P( X \le 1)\) when \(X \sim \mathsf{N}(0,1)\)

```
> cord.x <- c(-5,seq(-5,1,0.01),1); cord.y <- c(0,dnorm(seq(-5,1,0.01)),0)
> curve(dnorm(x,0,1), xlim=c(-5,5), main='Standard Normal density',
+ xlab = expression(x),ylab=expression(f(x)))
> polygon(cord.x,cord.y,col='skyblue') # Add the shaded area.
```

If \(X \sim \mathsf{N}(0,1)\), then \(P( X \le 1)\)

```
> pnorm(1, mean = 0, sd = 1)
[1] 0.8413447
```

\(P( X \ge 1)\) when \(X \sim \mathsf{N}(0,1)\)

```
> cord.x <- c(1,seq(1,5,0.01),5); cord.y <- c(0,dnorm(seq(1,5,0.01)),0)
> curve(dnorm(x,0,1), xlim=c(-5,5), main='Standard Normal density',
+ xlab = expression(x),ylab=expression(f(x)))
> polygon(cord.x,cord.y,col='skyblue') # Add the shaded area.
```

If \(X \sim \mathsf{N}(0,1)\), then \(P( X \ge 1) = 1 - P( X \le 1)\)

```
> 1 - pnorm(1, mean = 0, sd = 1)
[1] 0.1586553
```

\(P( -2 \le X \le 1)\) when \(X \sim \mathsf{N}(0,1)\)

```
> cord.x <- c(-2,seq(-2,-1,0.01),-1); cord.y <- c(0,dnorm(seq(-2,-1,0.01)),0)
> curve(dnorm(x,0,1), xlim=c(-3.5,3.5), main='Standard Normal density',
+ xlab = expression(x),ylab=expression(f(x)))
> polygon(cord.x,cord.y,col='skyblue')
```

- \(X \sim \mathsf{N}(0,1)\)
- \(P( -2 \le X \le -1) = P(X \le -1) - P(X \le -2)\)

```
> p1 = pnorm(-1, mean = 0, sd = 1)
> p2 = pnorm(-2, mean = 0, sd = 1)
> p1- p2
[1] 0.1359051
```

```
> curve(dnorm(x,0,1), xlim=c(-8,8), main='Normal densities',
+ xlab = expression(x),ylab=expression(f(x)))
> curve(dnorm(x,1,2), xlim=c(-8,8), main='Normal densities',
+ xlab = expression(x),ylab=expression(f(x)), add=TRUE, col='red')
```

Assume \(X \sim \mathsf{N}(1,2^2)\). Then \(Z = \frac{X -1}{2}\) follows \(\mathsf{N}(0,1)\)

- \(P( -2 \le X \le -1) = P(-1.5 \le Z \le -1)\)

```
> p1 = pnorm(-1, mean = 0, sd = 1)
> p2 = pnorm(-1.5, mean = 0, sd = 1)
> p1- p2
[1] 0.09184805
```

or

```
> pnorm(-1, mean = 1, sd = 2) - pnorm(-2, mean = 1, sd = 2)
[1] 0.09184805
```

- Random sampling: every different sample of a fixed size has an equal probability of being selected
- Sampling distribution: bridge between sample and population

```
> set.seed(123)
> xNormal = rnorm(1000,mean=0,sd=1)
> hist(xNormal, prob=TRUE)
> lines(density(xNormal)) # fit a density for data
> abline(v=0,col="blue")
```

```
> set.seed(123)
> xNormal = rnorm(2000,mean=2,sd=2)
> hist(xNormal, prob=TRUE)
> lines(density(xNormal))
> abline(v=2,col="blue")
```

Assume \(x_1,x_2,\ldots, x_n\) are i.i.d. \(\mathsf{N}(\mu,\sigma^2)\) and set \[\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i\]

- \(\bar{x} \sim \mathsf{N}(\mu,\sigma^2/n)\), i.e., \(\bar{x}\) follows a Normal dist with mean \(\mu\) and standard deviation \(\sigma/\sqrt{n}\)
- \(\bar{x}\) usually has a smaller variance, i.e., its distribution is usually more concentrated

```
> curve( dnorm(x, mean=2,sd=2), -5, 9, col="blue",ylab="Density",ylim=c(0,1),lwd=2)
> curve( dnorm(x, mean=2,sd=2/5), -5, 9, add = TRUE, col="red",ylab="Density",lty=2,
+ lwd=2,ylim=c(0,1))
```

- General large sample behavior of sample mean
Statement: Suppose \(\left\{X_1, X_2,\ldots, X_n\right\}\) is a random sample of size \(n\) from a distribution with mean \(\mu\) and finite variance \(\sigma^2\). Let \(S_n = \frac{1}{n}\sum_{i=1}^n X_i\) be the sample mean. Then \[\sqrt{n} \left(S_n - \mu\right) \stackrel{d}{\to} \mathsf{N}\left(0,\sigma^2\right)\]

Animation: painblogR

- CLT can fail when observations are dependent; see Lindeberg-Feller theorem
- CLT may not be obvious, i.e., use of CLT as an approximation may be very inaccurate, when sample size is not large enough; see Berry-Esseen theorem
- CLT is a universality principle that depends only on the mean and variance of the generating distribution

Given a random sample \(x_1,x_2,\ldots, x_n\), how likely is it from \(\mathsf{N}(\mu,\sigma^2)\)?

```
> set.seed(123)
> xNonNormal = 0.8*rnorm(300,mean=0,sd=1) + 0.2*rt(300, df = 5)
> qqnorm(xNonNormal)
> qqline(xNonNormal, col = 2, lwd=2)
```

```
> hist(xNonNormal, main="Histgram of non-Normal sample", prob=TRUE)
> lines(density(xNonNormal))
```

`> boxplot(xNonNormal)`

```
> set.seed(123)
> xNormal = rnorm(1000,mean=0,sd=1)
> qqnorm(xNormal)
> qqline(xNormal, col = 2, lwd=2)
```

```
> hist(xNormal, main="Histgram of Normal sample",prob=TRUE)
> lines(density(xNormal))
```

`> boxplot(xNormal)`

```
> set.seed(123)
> xNormal = rnorm(100,mean=0,sd=1)
> ks.test(xNormal, "pnorm",0,1) #or ks.test(x, "pnorm", mean=0, sd=1)
One-sample Kolmogorov-Smirnov test
data: xNormal
D = 0.093034, p-value = 0.3522
alternative hypothesis: two-sided
```

```
> tough = read.table("http://math.wsu.edu/faculty/xchen/stat412/data/stat412ASCII-tab/CH04/ex4-94.TXT", header=TRUE, sep="\t")
> tough1 = tough[,1]
> qqnorm(tough1)
> qqline(tough1, col = 2, lwd=2)
```

```
> sd(tough1)
[1] 0.1596723
>
> ks.test(tough1, "pnorm",mean(tough1),sd(tough1))
One-sample Kolmogorov-Smirnov test
data: tough1
D = 0.12638, p-value = 0.9457
alternative hypothesis: two-sided
```

- When sample size is very large, pure randomness will lead a test to reject Normality even if the random sample is from a Normal distribution

- If you use software to obtain your answers, please attach the codes
- Please put codes close to their associated answers

- Please get familiar with R, Rstudio and/or SAS, and study the codes used in the source file of the lecture notes
- Please study lecture notes and Chapters 1 to 2 of the textbook carefully with needed programming practice, in order to fully understand the concepts and data processing techniques discussed so far

- Exercises 3.7, 3.16, 3.23, 3.35, 3.41
- Exercises 4.19(a), 4.65(d), 4.94(a); for 4.94(a), please give your conclusion on if the data appear to follow a Normal distribution, for which you may use the KS test

```
> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] dplyr_0.4.2 ggplot2_2.1.0 knitr_1.17
loaded via a namespace (and not attached):
[1] Rcpp_0.12.0 magrittr_1.5 munsell_0.4.2
[4] colorspace_1.2-6 R6_2.1.0 stringr_1.2.0
[7] highr_0.5 plyr_1.8.3 tools_3.2.0
[10] revealjs_0.9 parallel_3.2.0 grid_3.2.0
[13] gtable_0.1.2 DBI_0.3.1 htmltools_0.3.5
[16] lazyeval_0.1.10 yaml_2.1.13 rprojroot_1.2
[19] digest_0.6.8 assertthat_0.1 reshape2_1.4.1
[22] evaluate_0.10.1 rmarkdown_1.6 labeling_0.3
[25] stringi_0.5-5 scales_0.4.0 backports_1.1.0
```

- John D. Storey whose course material templates I have adopted
- Nairanjana Dasgupta for help on preparing the course materials
- The slides on Getting Started in R were adapted from John D. Storey