Euroweight

Read Data

To read “txt” files, I use R function - read.table().

read.table('../datasets/euroweight.dat.txt',header = FALSE,
           dec = '.',na.strings = 'NA') -> euroweight_data

For dataset euroweight, variable descriptions are as follows:

V1: ID - this is the case number
V2: weight - weight of the euro coin in grams
V3: batch - number of the package

Test the hypotheses that the weights of coins in different packages are from the same distribution.

I use non-parametrical test pairwise.wilcox.test and Kruskal-Wallis test to test the hypotheses that the distributions of the weights of coins is the same in different packages.

Kruskal-Wallis test

kruskal.test(euroweight_data$V2~euroweight_data$V3)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  euroweight_data$V2 by euroweight_data$V3
## Kruskal-Wallis chi-squared = 97.5, df = 7, p-value < 2.2e-16

According to the result, I can conclude we need to reject the null hypothesis that the weights of coins in different packages are from the same distribution.

Pairwise Wilcox test

pairwise.wilcox.test(euroweight_data$V2,euroweight_data$V3)

## 
##  Pairwise comparisons using Wilcoxon rank sum test 
## 
## data:  euroweight_data$V2 and euroweight_data$V3 
## 
##   1       2       3       4       5       6       7      
## 2 1.00000 -       -       -       -       -       -      
## 3 0.04297 0.00025 -       -       -       -       -      
## 4 0.00141 0.10329 7.8e-12 -       -       -       -      
## 5 0.00108 0.10329 2.6e-12 1.00000 -       -       -      
## 6 0.76768 0.04297 1.00000 2.9e-07 1.7e-07 -       -      
## 7 1.00000 1.00000 0.00012 0.10329 0.10202 0.04297 -      
## 8 1.00000 0.10329 0.73578 1.4e-06 7.1e-07 1.00000 0.10202
## 
## P value adjustment method: holm

As we can see, the p-values of paires (1,3),(2,3),(1,4),(3,4),(1,5),(3,5),(2,6),(4,6),(5,6),(3,7),(6,7),(5,8),(4,8) are smaller than \(0.05\). So, we can conclude we need to reject the null hypothesis that the weights of coins in each pair mentioned previously are from the same distribution. For the other pairs, we cannot reject the null hypothesis.

Iris

Read Data

To read “txt” files, I use R function - read.table().

read.table('../datasets/iris.txt',header = FALSE,
           dec = '.',na.strings = 'NA',sep = ",") -> iris_data

For dataset iris, variable descriptions are as follows:

sepal length in cm
sepal width in cm
petal length in cm
petal width in cm
class

Correlation Test

Sepal length and Sepal width

Spearman rank correlation coefficient

Because the dataset have duplicate values, we cannot calculate p-value exactly.

#cor.test(jitter(iris_data$V1),jitter(iris_data$V2),method = "spearman",alternative = "t")
cor.test(iris_data$V1,iris_data$V2,method = "spearman",alternative = "two.sided",exact = FALSE)

## 
##  Spearman's rank correlation rho
## 
## data:  iris_data$V1 and iris_data$V2
## S = 652165, p-value = 0.05128
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.1594565

Kendall rank correlation coefficient

cor.test(iris_data$V1,iris_data$V2,method = "kendall",alternative = "two.sided")

## 
##  Kendall's rank correlation tau
## 
## data:  iris_data$V1 and iris_data$V2
## z = -1.2469, p-value = 0.2124
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##         tau 
## -0.07211192

From the two results above, p-value are all larger than \(0.05\) and the absolute of correlation coefficient are less than \(0.2\), so we can conclude we cannot reject the null hypothesis that sepal length and sepal width are independent.

Petal length and Petal width

Spearman rank correlation coefficient

Because the dataset have duplicate values, we cannot calculate p-value exactly.

cor.test(iris_data$V3,iris_data$V4,method = "spearman",alternative = "two.sided",exact = FALSE)

## 
##  Spearman's rank correlation rho
## 
## data:  iris_data$V3 and iris_data$V4
## S = 35997, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9360034

Kendall rank correlation coefficient

cor.test(iris_data$V3,iris_data$V4,method = "kendall",alternative = "two.sided")

## 
##  Kendall's rank correlation tau
## 
## data:  iris_data$V3 and iris_data$V4
## z = 13.911, p-value < 2.2e-16
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##       tau 
## 0.8030139

From the two results above, p-value are all smaller than \(0.05\), and absolute of correlation coefficient are greater than \(0.8\), we can conclude we should reject the null hypothesis that petal length and petal width are independent, they have a very strong correlation.

Cigarettes

Read Data

To read “txt” files, I use R function - read.table().

read.table('../datasets/cigarettes.dat.txt',header = FALSE,
           dec = '.',na.strings = 'NA') -> cigarettes_data

For dataset cigarettes, variable descriptions are as follows:

Company name
x1=tar (mg)
x2=nicotine (mg)
x3=weight (g)
y=carbon monoxide (mg)

Correlation Test

Nicotine and weight

Spearman rank correlation coefficient

Because the dataset have duplicate values, we cannot calculate p-value exactly.

cor.test(cigarettes_data$V3,cigarettes_data$V4,method = "spearman",exact = FALSE)

## 
##  Spearman's rank correlation rho
## 
## data:  cigarettes_data$V3 and cigarettes_data$V4
## S = 2089.8, p-value = 0.3472
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.1962293

Kendall rank correlation coefficient

cor.test(cigarettes_data$V3,cigarettes_data$V4,method = "kendall",exact = FALSE)

## 
##  Kendall's rank correlation tau
## 
## data:  cigarettes_data$V3 and cigarettes_data$V4
## z = 0.93471, p-value = 0.3499
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##     tau 
## 0.13378

Nicotine and Carbon monoxide

Spearman rank correlation coefficient

Since the dataset have duplicate values, we cannot calculate p-value exactly.

cor.test(cigarettes_data$V3,cigarettes_data$V5,method = "spearman",exact = FALSE)

## 
##  Spearman's rank correlation rho
## 
## data:  cigarettes_data$V3 and cigarettes_data$V5
## S = 317.68, p-value = 8.235e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.8778141

Kendall rank correlation coefficient

cor.test(cigarettes_data$V3,cigarettes_data$V5,method = "kendall",exact = FALSE)

## 
##  Kendall's rank correlation tau
## 
## data:  cigarettes_data$V3 and cigarettes_data$V5
## z = 4.9787, p-value = 6.402e-07
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##       tau 
## 0.7135688

From the two results above, p-value are all smaller than \(0.05\) and the absolute of correlation coefficient are greater than \(0.7\), so we can conclude we should reject the null hypothesis that nicotine and carbon monoxide are independent. They have a strong correlation.

Binom Test

Task 4

Question: Suppose in a coin tossing, the chance to get a head or tail is 50 %. In a real case, we have 100 coin tossings, and get 48 heads, is our original hypothesis true? [Use binom.test]

Answer: According to the question, I got the table below.

Status	Frequency
Head	48
Tail	52

Null Hypothesis: The chance to get a head is 50 %

binom.test(48,100,0.5)

## 
##  Exact binomial test
## 
## data:  48 and 100
## number of successes = 48, number of trials = 100, p-value = 0.7644
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.3790055 0.5822102
## sample estimates:
## probability of success 
##                   0.48

Null Hypothesis: The chance to get a tail is 50 %

binom.test(52,100,0.5)

## 
##  Exact binomial test
## 
## data:  52 and 100
## number of successes = 52, number of trials = 100, p-value = 0.7644
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.4177898 0.6209945
## sample estimates:
## probability of success 
##                   0.52

Same results, p-value are large than \(0.05\), so we can conclude we can not reject the null hypothesis that the chance to get a head or tail is \(50\%\).

Task 5

Question: 5. Did a fair coin produce 8 heads in 10 flips? By “fair” we mean the coin with equal probabilities of appearance of both sides. [Use binom.test]

Answer:

binom.test(8,10,0.5)

## 
##  Exact binomial test
## 
## data:  8 and 10
## number of successes = 8, number of trials = 10, p-value = 0.1094
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.4439045 0.9747893
## sample estimates:
## probability of success 
##                    0.8

Since the p-value is larger than \(0.05\), so we can not reject the null hypothesis.

Acknowledgements

Thanks for knitr designed by(Xie 2015).

References

Xie, Yihui. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. http://yihui.name/knitr/.

Lecture 3 - Nonparametric Tests - Zhao Chi - 19.M09

Zhao Chi

Euroweight

Read Data

Test the hypotheses that the weights of coins in different packages are from the same distribution.

Kruskal-Wallis test

Pairwise Wilcox test

Iris

Read Data

Correlation Test

Sepal length and Sepal width

Spearman rank correlation coefficient

Kendall rank correlation coefficient

Petal length and Petal width

Spearman rank correlation coefficient

Kendall rank correlation coefficient

Cigarettes

Read Data

Correlation Test

Nicotine and weight

Spearman rank correlation coefficient

Kendall rank correlation coefficient

Nicotine and Carbon monoxide

Spearman rank correlation coefficient

Kendall rank correlation coefficient

Binom Test

Task 4

Task 5

Acknowledgements

References