To read “txt” files, I use R function - read.table().
read.table('../datasets/euroweight.dat.txt',header = FALSE,
dec = '.',na.strings = 'NA') -> euroweight_data
For dataset euroweight, variable descriptions are as follows:
I use non-parametrical test pairwise.wilcox.test and Kruskal-Wallis test to test the hypotheses that the distributions of the weights of coins is the same in different packages.
##
## Kruskal-Wallis rank sum test
##
## data: euroweight_data$V2 by euroweight_data$V3
## Kruskal-Wallis chi-squared = 97.5, df = 7, p-value < 2.2e-16
According to the result, I can conclude we need to reject the null hypothesis that the weights of coins in different packages are from the same distribution.
##
## Pairwise comparisons using Wilcoxon rank sum test
##
## data: euroweight_data$V2 and euroweight_data$V3
##
## 1 2 3 4 5 6 7
## 2 1.00000 - - - - - -
## 3 0.04297 0.00025 - - - - -
## 4 0.00141 0.10329 7.8e-12 - - - -
## 5 0.00108 0.10329 2.6e-12 1.00000 - - -
## 6 0.76768 0.04297 1.00000 2.9e-07 1.7e-07 - -
## 7 1.00000 1.00000 0.00012 0.10329 0.10202 0.04297 -
## 8 1.00000 0.10329 0.73578 1.4e-06 7.1e-07 1.00000 0.10202
##
## P value adjustment method: holm
As we can see, the p-values of paires (1,3),(2,3),(1,4),(3,4),(1,5),(3,5),(2,6),(4,6),(5,6),(3,7),(6,7),(5,8),(4,8) are smaller than \(0.05\). So, we can conclude we need to reject the null hypothesis that the weights of coins in each pair mentioned previously are from the same distribution. For the other pairs, we cannot reject the null hypothesis.
To read “txt” files, I use R function - read.table().
read.table('../datasets/iris.txt',header = FALSE,
dec = '.',na.strings = 'NA',sep = ",") -> iris_data
For dataset iris, variable descriptions are as follows:
Because the dataset have duplicate values, we cannot calculate p-value exactly.
#cor.test(jitter(iris_data$V1),jitter(iris_data$V2),method = "spearman",alternative = "t")
cor.test(iris_data$V1,iris_data$V2,method = "spearman",alternative = "two.sided",exact = FALSE)
##
## Spearman's rank correlation rho
##
## data: iris_data$V1 and iris_data$V2
## S = 652165, p-value = 0.05128
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.1594565
##
## Kendall's rank correlation tau
##
## data: iris_data$V1 and iris_data$V2
## z = -1.2469, p-value = 0.2124
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## -0.07211192
From the two results above, p-value are all larger than \(0.05\) and the absolute of correlation coefficient are less than \(0.2\), so we can conclude we cannot reject the null hypothesis that sepal length and sepal width are independent.
Because the dataset have duplicate values, we cannot calculate p-value exactly.
##
## Spearman's rank correlation rho
##
## data: iris_data$V3 and iris_data$V4
## S = 35997, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9360034
##
## Kendall's rank correlation tau
##
## data: iris_data$V3 and iris_data$V4
## z = 13.911, p-value < 2.2e-16
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## 0.8030139
From the two results above, p-value are all smaller than \(0.05\), and absolute of correlation coefficient are greater than \(0.8\), we can conclude we should reject the null hypothesis that petal length and petal width are independent, they have a very strong correlation.
To read “txt” files, I use R function - read.table().
read.table('../datasets/cigarettes.dat.txt',header = FALSE,
dec = '.',na.strings = 'NA') -> cigarettes_data
For dataset cigarettes, variable descriptions are as follows:
Because the dataset have duplicate values, we cannot calculate p-value exactly.
##
## Spearman's rank correlation rho
##
## data: cigarettes_data$V3 and cigarettes_data$V4
## S = 2089.8, p-value = 0.3472
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.1962293
##
## Kendall's rank correlation tau
##
## data: cigarettes_data$V3 and cigarettes_data$V4
## z = 0.93471, p-value = 0.3499
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## 0.13378
From the two results above, p-value are all larger than \(0.05\) and the absolute of correlation coefficient are less than \(0.2\), so we can conclude we cannot reject the null hypothesis that nicotine and weight are independent.
Since the dataset have duplicate values, we cannot calculate p-value exactly.
##
## Spearman's rank correlation rho
##
## data: cigarettes_data$V3 and cigarettes_data$V5
## S = 317.68, p-value = 8.235e-09
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.8778141
##
## Kendall's rank correlation tau
##
## data: cigarettes_data$V3 and cigarettes_data$V5
## z = 4.9787, p-value = 6.402e-07
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## 0.7135688
From the two results above, p-value are all smaller than \(0.05\) and the absolute of correlation coefficient are greater than \(0.7\), so we can conclude we should reject the null hypothesis that nicotine and carbon monoxide are independent. They have a strong correlation.
Question: Suppose in a coin tossing, the chance to get a head or tail is 50 %. In a real case, we have 100 coin tossings, and get 48 heads, is our original hypothesis true? [Use binom.test]
Answer: According to the question, I got the table below.
Status | Frequency |
---|---|
Head | 48 |
Tail | 52 |
Null Hypothesis: The chance to get a head is 50 %
##
## Exact binomial test
##
## data: 48 and 100
## number of successes = 48, number of trials = 100, p-value = 0.7644
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.3790055 0.5822102
## sample estimates:
## probability of success
## 0.48
Null Hypothesis: The chance to get a tail is 50 %
##
## Exact binomial test
##
## data: 52 and 100
## number of successes = 52, number of trials = 100, p-value = 0.7644
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.4177898 0.6209945
## sample estimates:
## probability of success
## 0.52
Same results, p-value are large than \(0.05\), so we can conclude we can not reject the null hypothesis that the chance to get a head or tail is \(50\%\).
Question: 5. Did a fair coin produce 8 heads in 10 flips? By “fair” we mean the coin with equal probabilities of appearance of both sides. [Use binom.test]
Answer:
##
## Exact binomial test
##
## data: 8 and 10
## number of successes = 8, number of trials = 10, p-value = 0.1094
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.4439045 0.9747893
## sample estimates:
## probability of success
## 0.8
Since the p-value is larger than \(0.05\), so we can not reject the null hypothesis.
Thanks for knitr designed by(Xie 2015).
Xie, Yihui. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. http://yihui.name/knitr/.