Euroweight

Read Data

To read “txt” files, I use R function - read.table().

For dataset euroweight, variable descriptions are as follows:

  1. V1: ID - this is the case number
  2. V2: weight - weight of the euro coin in grams
  3. V3: batch - number of the package

Test the hypotheses that the weights of coins in different packages are from the same distribution.

I use non-parametrical test pairwise.wilcox.test and Kruskal-Wallis test to test the hypotheses that the distributions of the weights of coins is the same in different packages.

Kruskal-Wallis test

## 
##  Kruskal-Wallis rank sum test
## 
## data:  euroweight_data$V2 by euroweight_data$V3
## Kruskal-Wallis chi-squared = 97.5, df = 7, p-value < 2.2e-16

According to the result, I can conclude we need to reject the null hypothesis that the weights of coins in different packages are from the same distribution.

Pairwise Wilcox test

## 
##  Pairwise comparisons using Wilcoxon rank sum test 
## 
## data:  euroweight_data$V2 and euroweight_data$V3 
## 
##   1       2       3       4       5       6       7      
## 2 1.00000 -       -       -       -       -       -      
## 3 0.04297 0.00025 -       -       -       -       -      
## 4 0.00141 0.10329 7.8e-12 -       -       -       -      
## 5 0.00108 0.10329 2.6e-12 1.00000 -       -       -      
## 6 0.76768 0.04297 1.00000 2.9e-07 1.7e-07 -       -      
## 7 1.00000 1.00000 0.00012 0.10329 0.10202 0.04297 -      
## 8 1.00000 0.10329 0.73578 1.4e-06 7.1e-07 1.00000 0.10202
## 
## P value adjustment method: holm

As we can see, the p-values of paires (1,3),(2,3),(1,4),(3,4),(1,5),(3,5),(2,6),(4,6),(5,6),(3,7),(6,7),(5,8),(4,8) are smaller than \(0.05\). So, we can conclude we need to reject the null hypothesis that the weights of coins in each pair mentioned previously are from the same distribution. For the other pairs, we cannot reject the null hypothesis.

Iris

Read Data

To read “txt” files, I use R function - read.table().

For dataset iris, variable descriptions are as follows:

  1. sepal length in cm
  2. sepal width in cm
  3. petal length in cm
  4. petal width in cm
  5. class

Correlation Test

Sepal length and Sepal width

Spearman rank correlation coefficient

Because the dataset have duplicate values, we cannot calculate p-value exactly.

## 
##  Spearman's rank correlation rho
## 
## data:  iris_data$V1 and iris_data$V2
## S = 652165, p-value = 0.05128
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.1594565

Kendall rank correlation coefficient

## 
##  Kendall's rank correlation tau
## 
## data:  iris_data$V1 and iris_data$V2
## z = -1.2469, p-value = 0.2124
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##         tau 
## -0.07211192

From the two results above, p-value are all larger than \(0.05\) and the absolute of correlation coefficient are less than \(0.2\), so we can conclude we cannot reject the null hypothesis that sepal length and sepal width are independent.

Petal length and Petal width

Spearman rank correlation coefficient

Because the dataset have duplicate values, we cannot calculate p-value exactly.

## 
##  Spearman's rank correlation rho
## 
## data:  iris_data$V3 and iris_data$V4
## S = 35997, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9360034

Kendall rank correlation coefficient

## 
##  Kendall's rank correlation tau
## 
## data:  iris_data$V3 and iris_data$V4
## z = 13.911, p-value < 2.2e-16
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##       tau 
## 0.8030139

From the two results above, p-value are all smaller than \(0.05\), and absolute of correlation coefficient are greater than \(0.8\), we can conclude we should reject the null hypothesis that petal length and petal width are independent, they have a very strong correlation.

Cigarettes

Read Data

To read “txt” files, I use R function - read.table().

For dataset cigarettes, variable descriptions are as follows:

  1. Company name
  2. x1=tar (mg)
  3. x2=nicotine (mg)
  4. x3=weight (g)
  5. y=carbon monoxide (mg)

Correlation Test

Nicotine and weight

Spearman rank correlation coefficient

Because the dataset have duplicate values, we cannot calculate p-value exactly.

## 
##  Spearman's rank correlation rho
## 
## data:  cigarettes_data$V3 and cigarettes_data$V4
## S = 2089.8, p-value = 0.3472
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.1962293

Kendall rank correlation coefficient

## 
##  Kendall's rank correlation tau
## 
## data:  cigarettes_data$V3 and cigarettes_data$V4
## z = 0.93471, p-value = 0.3499
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##     tau 
## 0.13378

From the two results above, p-value are all larger than \(0.05\) and the absolute of correlation coefficient are less than \(0.2\), so we can conclude we cannot reject the null hypothesis that nicotine and weight are independent.

Nicotine and Carbon monoxide

Spearman rank correlation coefficient

Since the dataset have duplicate values, we cannot calculate p-value exactly.

## 
##  Spearman's rank correlation rho
## 
## data:  cigarettes_data$V3 and cigarettes_data$V5
## S = 317.68, p-value = 8.235e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.8778141

Kendall rank correlation coefficient

## 
##  Kendall's rank correlation tau
## 
## data:  cigarettes_data$V3 and cigarettes_data$V5
## z = 4.9787, p-value = 6.402e-07
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##       tau 
## 0.7135688

From the two results above, p-value are all smaller than \(0.05\) and the absolute of correlation coefficient are greater than \(0.7\), so we can conclude we should reject the null hypothesis that nicotine and carbon monoxide are independent. They have a strong correlation.

Binom Test

Task 4

Question: Suppose in a coin tossing, the chance to get a head or tail is 50 %. In a real case, we have 100 coin tossings, and get 48 heads, is our original hypothesis true? [Use binom.test]

Answer: According to the question, I got the table below.

Status Frequency
Head 48
Tail 52

Null Hypothesis: The chance to get a head is 50 %

## 
##  Exact binomial test
## 
## data:  48 and 100
## number of successes = 48, number of trials = 100, p-value = 0.7644
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.3790055 0.5822102
## sample estimates:
## probability of success 
##                   0.48

Null Hypothesis: The chance to get a tail is 50 %

## 
##  Exact binomial test
## 
## data:  52 and 100
## number of successes = 52, number of trials = 100, p-value = 0.7644
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.4177898 0.6209945
## sample estimates:
## probability of success 
##                   0.52

Same results, p-value are large than \(0.05\), so we can conclude we can not reject the null hypothesis that the chance to get a head or tail is \(50\%\).

Task 5

Question: 5. Did a fair coin produce 8 heads in 10 flips? By “fair” we mean the coin with equal probabilities of appearance of both sides. [Use binom.test]

Answer:

## 
##  Exact binomial test
## 
## data:  8 and 10
## number of successes = 8, number of trials = 10, p-value = 0.1094
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.4439045 0.9747893
## sample estimates:
## probability of success 
##                    0.8

Since the p-value is larger than \(0.05\), so we can not reject the null hypothesis.

Acknowledgements

Thanks for knitr designed by(Xie 2015).

References

Xie, Yihui. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. http://yihui.name/knitr/.