Several tests assume the data are drawn from normally distributed populations. As such if this assumption is violated we are faced with a choice – do we abandon hypothesis testing as planned or do we take an alternative path? Well, step one is to figure out when to abandon the assumption of normality.
This is an easy diagnostic tool because we can plot our histogram or box-plot and then superimpose the normal distribution to see how far off we are. Another useful visual device is the quantile-quantile plot. The idea behind these latter plots is quite simple. Basically you split your sample into portions and then see what percent fall above or below a certain percentile. We know what the percentiles should look like if we are dealing with a normal distribution. If we then juxtapose the theoretical quantiles with the sample-based quantile, the more these two overlap the more likely it is that the sample are coming from a normally distributed population. Obviously if there is a 1:1 fit we would have a straight line; this is what you see in the plot. Any deviations from this line are evident from the dot-plot. The more the dots deviate from the line the less likely it is that the sample data are coming from a normally distributed population.
Marine organisms do not enjoy anywhere near the same protection from human influence as do terrestrial species. However, marine reserves are becoming increasingly popular for biological conservation and the protection of fisheries. But are reserves effective in preserving marine wildlife? Researchers matched each of 32 marine reserves to a control location, which was either the site of the reserve before it became protected or a similar unprotected site nearby. One index of protection evaluated by the study was the “biomass ratio”, which is the total mass of all marine plants and animals per unit area of reserve divided by the same quantity in the unprotected control. This biomass ratio would equal one if protection had no effect. The biomass ratio would exceed one if protection were beneficial and it would be less than one if protection reduced biomass. Does the mean biomass ratio differ from one (i.e., are marine reserves effective)?
H0: Mean biomass ratio is unaffected by reserve protection (μ=1)
HA: Mean biomass ratio is affected by reserve protection (μ≠1)
First things first, are these data coming from a normally distributed population?
marine = read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter13/chap13e1MarineReserve.csv"))
head(marine); names(marine)
## biomassRatio
## 1 1.34
## 2 1.96
## 3 2.49
## 4 1.27
## 5 1.19
## 6 1.15
## [1] "biomassRatio"
boxplot(marine$biomassRatio, horizontal = TRUE, xlab="Biomass Ratio", main="Boxplot of Marine Biomass Ratio", col = "firebrick")
h = hist(marine$biomassRatio, xlab="Biomass Ratio", main="Histogram of Marine Biomass Ratio", col = "firebrick")
x = marine$biomassRatio
xfit = seq(min(x), max(x), length=32)
yfit = dnorm(xfit, mean=mean(x), sd=sd(x))
yfit = yfit*diff(h$mids[1:2])*length(x)
lines(xfit, yfit, col="blue", lwd=2)
qqnorm(marine$biomassRatio, col = "cornflowerblue");
qqline(marine$biomassRatio, col = "firebrick")
All three plots show skew, suggesting anything but a normal distribution underlying these data.
Two tests are very popular, one is the Shapiro-Wilk Test, and the other is the Anderson-Darling Test. Shapiro-Wilk is designed for sample sizes under 5,000 while Anderson-Darling is designed for large samples.
H0: The sample is drawn from a normal population
HA: The sample is not drawn from a normal population
W=(n∑aiY(i))2n∑(Yi−ˉY)2 0≤W≤1
W→1: Observed distribution is as expected if population were Normal
Skewness: n∑i=1(Yi−ˉY)3(n−1)s3
s=0: Normal distribution; s<0: skewed left; s>0: skewed right
Kurtosis: n∑i=1(Yi−ˉY)4(n−1)s4−3
k=0: Standard Normal distribution; K>0: Peaked''; $k < 0$:
Flat’’
shapiro.test(marine$biomassRatio)
##
## Shapiro-Wilk normality test
##
## data: marine$biomassRatio
## W = 0.81751, p-value = 8.851e-05
library(nortest)
ad.test(marine$biomassRatio)
##
## Anderson-Darling normality test
##
## data: marine$biomassRatio
## A = 2.0194, p-value = 2.892e-05
library(moments)
skewness(marine$biomassRatio)
## [1] 1.688122
kurtosis(marine$biomassRatio)
## [1] 5.641057
Both the Shapiro-Wilk and the Anderson-Darling test suggest that the null of normality can be easily rejected. Note: AD is used here purely for illustration purposes and is not needed in this case given the small sample size.
What if we ignore violations of the assumption of normality? Perhaps nothing will go wrong in terms of the test results. Most tests are robust to violations of normality, so long as you can rely on the Central Limit Theorem and the test you are dealing with involves the mean. In brief, large samples will allow you to get away with samples exhibiting some non-normal distribution so long as you are not testing variances. How large is large? There is no clear-cut answer. Depending upon what you are looking to test, how many groups are involved, and the research design even 500 units in each group may be insufficient.
The rule of thumb used for sample sizes and relative standard deviations of the groups applies here as well. If you have small samples or the standard deviation of one group is quite a bit larger than that of the other group(s) (i.e., one standard deviation is at least twice the other standard deviation), then you should not assume equal variances. Of course you would be betetr off using formal tests for equality of variance and for the homogeneity of variances, respectively (i.e., the F-test and Levene’s test).
If the F-test and Levene’s test give you conflicting results I woud recommend going with Levene’s test if the data are non-normally distributed
If we don’t want to ignore violations of the normality assumption we have no choice but to transform the data. Several transformations are available for use, each with the goal of reducing the skew in the sample (i.e., rendering it more “normally” distributed).
The log transformation is most useful when:
Arcsine Transformation If data are proportions the arcsine: arcsin(√p) is suggested but I would not do this readily … Proportions are essentially coming from your Binomial or Poisson distributions and in these cases we would rarely run a t−test or fit a linear regression.
Other Transformations The square (Y′=Y2) is used f the data are skewed left, and if this fails, then the anti-log might work (Y′=eY).
If data are skewed right, the reciprocal may also work (Y′=1Y).
For the square and the reciprocal you need to have all measurements be positive. If they are negative then you need to multiple each by −1 and then carry out the transformation.
If we use a transformation, the estimated statistics have to be back-transformed to the original metric.
The Corresponding Back-transformations
Non-parametric tests are used (a) when transformations do not work, or (b) the data represent ordinal categories (or are ranked data)
Assumption: Samples are drawn independently from a continuous distribution
Let us run the test with the Sexual Conflict data.
The process by which a single species splits into two species is still not well understood. One proposal involves ``sexual conflict’’ – a genetic arms race between males and females that arises from their different reproductive roles. For example, in a number of insect species, male seminal fluid contains chemicals that reduce the tendency of the female to mate again with other males. However, these chemicals also reduce female survival, so females have developed mechanisms to counter these chemicals. Sexual conflict can cause rapid genetic divergence between isolated populations of the same species, leading to the formation of new species. Sexual conflict is more pronounced in species in which females mate more than once, leading to the prediction that they should form new species at a more rapid rate.
To investigate this, Arnqvist et al. (2000) identified 25 insect taxa (groups) in which females mate multiple times, and they paired each of these groups to a closely related insect group in which females only mate once. Which type of insects tend to have more species?
Note: These are treated as paired data because the two sets of groups are closely related in the sense of being almost “identical”
taxa = read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter13/chap13e4SexualConflict.csv"))
head(taxa); names(taxa); summary(taxa)
## taxonPair nSpeciesMultipleMating nSpeciesSingleMating difference
## 1 A 53 10 43
## 2 B 73 120 -47
## 3 C 228 74 154
## 4 D 353 289 64
## 5 E 157 30 127
## 6 F 300 4 296
## [1] "taxonPair" "nSpeciesMultipleMating"
## [3] "nSpeciesSingleMating" "difference"
## taxonPair nSpeciesMultipleMating nSpeciesSingleMating
## A : 1 Min. : 7 Min. : 4.0
## B : 1 1st Qu.: 34 1st Qu.: 10.0
## C : 1 Median : 77 Median : 30.0
## D : 1 Mean : 1134 Mean : 263.5
## E : 1 3rd Qu.: 228 3rd Qu.: 115.0
## F : 1 Max. :21000 Max. :3500.0
## (Other):19
## difference
## Min. : -980.0
## 1st Qu.: -3.0
## Median : 16.0
## Mean : 870.5
## 3rd Qu.: 79.0
## Max. :20940.0
##
par(mfrow=c(1,2))
boxplot(taxa$difference, horizontal = TRUE, col = "firebrick", main="Boxplot of Difference in Species Number", xlab="", cex.main=0.9)
qqnorm(taxa$difference, col="blue");
qqline(taxa$difference, col="red")
shapiro.test(taxa$difference)
##
## Shapiro-Wilk normality test
##
## data: taxa$difference
## W = 0.25473, p-value = 2.911e-10
The difference assumes negative values so what do we do? We can try the following:
log.diff = log(taxa$difference + (2 - min(taxa$difference)))
summary(log.diff)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.6931 6.8865 6.9058 6.8185 6.9670 9.9952
boxplot(log.diff, horizontal=TRUE)
shapiro.test(log.diff)
##
## Shapiro-Wilk normality test
##
## data: log.diff
## W = 0.45146, p-value = 1.284e-08
That didn’t work. We could try some R packages designed to help transform the data but that would be futile here. Consequently, we can opt for the Sign Test.
H0: Median difference in number of species is =0
HA: Median difference in number of species is ≠0
First thing we will do is count how many cases are +, how many −, and how many are equal to the hypothesized median of 0
library(BSDA)
SIGN.test(taxa$nSpeciesMultipleMating, taxa$nSpeciesSingleMating, md = 0, alternative = "two.sided", conf.level = 0.95)
##
## Dependent-samples Sign-Test
##
## data: taxa$nSpeciesMultipleMating and taxa$nSpeciesSingleMating
## S = 18, p-value = 0.04329
## alternative hypothesis: true median difference is not equal to 0
## 95 percent confidence interval:
## 1.00000 77.16674
## sample estimates:
## median of x-y
## 16
##
## Achieved and Interpolated Confidence Intervals:
##
## Conf.Level L.E.pt U.E.pt
## Lower Achieved CI 0.8922 1 70.0000
## Interpolated CI 0.9500 1 77.1667
## Upper Achieved CI 0.9567 1 78.0000
Note the p-value of 0.04329; we can reject the NULL hypothesis. The data suggest that groups of insects whose females mate multiple times have more species than groups whose females mate singly, consistent with the sexual-conflict hypothesis.
You could also do this test manually:
taxa$sign = "equal to median"
taxa$sign[taxa$difference > 0] = "plus"
taxa$sign[taxa$difference < 0] = "minus"
taxa$sign = factor(taxa$sign, levels = c("minus", "equal to median", "plus"))
table(taxa$sign)
##
## minus equal to median plus
## 7 0 18
So we have 7 minuses and 18 pluses. If the distribution were evenly spread we should have seen 50% of the cases below and 50% above the median. This is not what we see. Lets use the binomial test:
binom.test(7, 25, p=0.5)
##
## Exact binomial test
##
## data: 7 and 25
## number of successes = 7, number of trials = 25, p-value = 0.04329
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.1207167 0.4938768
## sample estimates:
## probability of success
## 0.28
Given the p-value we can reject H0; the Median difference is not =0. Note the p-value. Surprised?
The Sign Test is weak in that it does not take magnitudes of how much above/below the median an observation falls. The Wilcoxon Signed-Rank Test is a slight improvement because it ranks observations on the basis of how far below/above they fall from the hypothesized median.
Let us run the test on the Sexual Conflict data:
wilcox.test(taxa$nSpeciesMultipleMating, taxa$nSpeciesSingleMating, paired = TRUE)
## Warning in wilcox.test.default(taxa$nSpeciesMultipleMating, taxa
## $nSpeciesSingleMating, : cannot compute exact p-value with ties
##
## Wilcoxon signed rank test with continuity correction
##
## data: taxa$nSpeciesMultipleMating and taxa$nSpeciesSingleMating
## V = 230, p-value = 0.07139
## alternative hypothesis: true location shift is not equal to 0
Do not reject H0; the distributions appear to be the same (i.e., there is no difference in the number of species between the polyandrous and the monandrous groups).
If we have a two-sample (i.e., unpaired) design, then the Mann-Whitney U-Test can be used assuming we couldn’t transform the data to run a t−test.
Sexual Cannibalism in Sagebrush Crickets
The sage cricket, Cyphoderris strepitans, has an unusual form of mating. During mating, the male offers his fleshy hind wings to the female to eat. The wounds are not fatal but a male with already nibbled wings is less likely to be chosen by females he meets later. Females get some nutrition from feeding on the wings, which raises the question, “Are females more likely to mate if they are hungry?”
Johnson et al. (1999) answered this question by randomly dividing 24 females into two groups: one group of 11 females were starved for at least two days and another group of 13 females was fed during the same period. Finally, each female was put separately into a cage with a single (new) male, and the waiting time to mating was recorded.
H0: Time to mating is the same for female crickets that were starved as for those who were fed
HA: Time to mating is not the same for female crickets that were starved as for those who were fed
cric = read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter13/chap13e5SagebrushCrickets.csv"))
head(cric); names(cric)
## feedingStatus timeToMating
## 1 starved 1.9
## 2 starved 2.1
## 3 starved 3.8
## 4 starved 9.0
## 5 starved 9.6
## 6 starved 13.0
## [1] "feedingStatus" "timeToMating"
wilcox.test(cric$timeToMating ~ cric$feedingStatus, paired=FALSE)
##
## Wilcoxon rank sum test
##
## data: cric$timeToMating by cric$feedingStatus
## W = 88, p-value = 0.3607
## alternative hypothesis: true location shift is not equal to 0
Do not reject H0; time to mating does not appear to be different for starved versus fed female crickets
Note the warning about ties; If we have any ties we need to switch to a version of the Wilcoxon test designed to handle ties. I’ll use the data on gestures and sightedness.
Gesturing is common during human speech. Is this behavior learned via exposure? A measure was made of the number of gestures produced by each of 12 pairs of sighted individuals while talking to sighted individuals. This result was compared with the number of gestures produced while talking by each of 12 pairs of people who had been blind since birth and were therefore presumably unexposed to the gestures of others. These data are given below.
gestures = read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/review2/rev2q13BlindGestures.csv"))
head(gestures); names(gestures)
## sightedness numberOfGestures
## 1 Blind 0
## 2 Blind 1
## 3 Blind 1
## 4 Blind 2
## 5 Blind 1
## 6 Blind 1
## [1] "sightedness" "numberOfGestures"
wilcox.test(gestures$numberOfGestures ~ gestures$sightedness, paired = FALSE)
## Warning in wilcox.test.default(x = c(0L, 1L, 1L, 2L, 1L, 1L, 1L, 3L, 1L, :
## cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: gestures$numberOfGestures by gestures$sightedness
## W = 62.5, p-value = 0.5755
## alternative hypothesis: true location shift is not equal to 0
Note the warning message that you can’t recover an exact p-value with this test because of ties in the data. Mercifully we have another test we could use in the presence of ties.
library(exactRankTests)
wilcox.exact(gestures$numberOfGestures ~ gestures$sightedness, paired=FALSE)
##
## Exact Wilcoxon rank sum test
##
## data: gestures$numberOfGestures by gestures$sightedness
## W = 62.5, p-value = 0.6361
## alternative hypothesis: true mu is not equal to 0
This (very weak) test is used to compare the distributions of two groups by comparing the empirical cumulative distribution functions (ecdfs) of the two groups and finding the greatest absolute distance between the two
Let us see when and how we might use this test > The stalk-eyed fly, Crytodiopsis dalmanni, is a bizarre-looking insect from the jungles of Malaysia. Its eyes are at the ends of long stalks that emerge from its head, making the fly look like something from the cantina scene in Star Wars. These eye stalks are present in both sexes, but they are particularly impressive in males. The span of the eye stalk in males enhances their attractiveness to females as well as their success in battles against other males. The span, in millimeters, from one eye to the other, was measured in a random sample of 45 male stalk-eyed flies.
Let us plot the distribution and run the formal tests to see if we can stick with the usual t−test
cric = read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter13/chap13e5SagebrushCrickets.csv"))
head(cric); names(cric)
## feedingStatus timeToMating
## 1 starved 1.9
## 2 starved 2.1
## 3 starved 3.8
## 4 starved 9.0
## 5 starved 9.6
## 6 starved 13.0
## [1] "feedingStatus" "timeToMating"
boxplot(cric$timeToMating ~ cric$feedingStatus, horizontal=TRUE)
var.test(cric$timeToMating ~ cric$feedingStatus)
##
## F test to compare two variances
##
## data: cric$timeToMating by cric$feedingStatus
## F = 2.8393, num df = 12, denom df = 10, p-value = 0.108
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.7841424 9.5786535
## sample estimates:
## ratio of variances
## 2.839337
library(car)
## Loading required package: carData
##
## Attaching package: 'carData'
## The following objects are masked from 'package:BSDA':
##
## Vocab, Wool
leveneTest(cric$timeToMating ~ cric$feedingStatus, center = mean)
## Levene's Test for Homogeneity of Variance (center = mean)
## Df F value Pr(>F)
## group 1 8.2093 0.008995 **
## 22
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
leveneTest(cric$timeToMating ~ cric$feedingStatus, center = median)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 4.5709 0.04388 *
## 22
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The box-plots and the formal tests (at least Levene’s test for homogeneity of variances) suggest we should reject the null of equal variances. We could try and normalize the data via the natural logarithm.
cric$log.time = log(cric$timeToMating)
boxplot(cric$log.time ~ cric$feedingStatus, horizontal=TRUE)
var.test(cric$log.time ~ cric$feedingStatus)
##
## F test to compare two variances
##
## data: cric$log.time by cric$feedingStatus
## F = 2.0434, num df = 12, denom df = 10, p-value = 0.2663
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.5643217 6.8934446
## sample estimates:
## ratio of variances
## 2.043378
library(car)
leveneTest(cric$log.time ~ cric$feedingStatus, center = mean)
## Levene's Test for Homogeneity of Variance (center = mean)
## Df F value Pr(>F)
## group 1 4.3155 0.04965 *
## 22
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
leveneTest(cric$log.time ~ cric$feedingStatus, center = median)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 2.21 0.1513
## 22
Now we have a quandary, using the median tells us we cannot reject H0 but both the test via the mean and the box-plots suggest otherwise. A safe approach would be to assume we cannot normalize the data so the Kolmogorov-Smirnov test may be called for.
ks.test(cric$feedingStatus, cric$timeToMating, alternative = "two.sided")
## Warning in ks.test(cric$feedingStatus, cric$timeToMating, alternative =
## "two.sided"): cannot compute exact p-value with ties
##
## Two-sample Kolmogorov-Smirnov test
##
## data: cric$feedingStatus and cric$timeToMating
## D = 0.875, p-value = 2.093e-08
## alternative hypothesis: two-sided
Uh oh! We have ties so this won’t work. Technically, a permutation test would be useful here.
library(exactRankTests) # for perm.test()
perm.test(cric$timeToMating ~ cric$feedingStatus, paired=FALSE, alternative="two.sided", exact=TRUE)
##
## 2-sample Permutation Test (scores mapped into 1:(m+n) using
## rounded scores)
##
## data: cric$timeToMating by cric$feedingStatus
## T = 123, p-value = 0.1376
## alternative hypothesis: true mu is not equal to 0
The test tells us we cannot reject H0; diet has no effect on eye spans.
Recycling paper has some obvious benefits but it may have unintended consequences. For example, perhaps people are less careful about how much paper they use if they know that their waste will be recycled. Researchers tested this idea by measuring paper use in two groups of experimental participants. Each person was placed in a room alone with scissors, paper, and a trash can, and was told that he or she was testing the scissors. In the “recycling” group only there was a recycling bin in the room. The amount of paper used by each participant was measured in grams.
recycle = read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter13/chap13q03Recycling.csv"))
head(recycle); names(recycle)
## recyclingBinPresent paperUsage
## 1 absent 4
## 2 absent 4
## 3 absent 4
## 4 absent 4
## 5 absent 4
## 6 absent 4
## [1] "recyclingBinPresent" "paperUsage"
We will start by testing whether the normality assumption is met.
boxplot(recycle$paperUsage ~ recycle$recyclingBinPresent, horizontal=TRUE)
tapply(recycle$paperUsage, recycle$recyclingBinPresent, FUN = shapiro.test)
## $absent
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.8734, p-value = 0.009063
##
##
## $present
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.81731, p-value = 0.002062
var.test(recycle$paperUsage, recycle$recyclingBinPresent)
## Warning in var(y): Calling var(x) on a factor x is deprecated and will become an error.
## Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.
##
## F test to compare two variances
##
## data: recycle$paperUsage and recycle$recyclingBinPresent
## F = 392.73, num df = 40, denom df = 40, p-value < 2.2e-16
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 209.4325 736.4412
## sample estimates:
## ratio of variances
## 392.7273
library(car)
leveneTest(recycle$paperUsage ~ recycle$recyclingBinPresent, center = mean)
## Levene's Test for Homogeneity of Variance (center = mean)
## Df F value Pr(>F)
## group 1 10.374 0.002578 **
## 39
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
leveneTest(recycle$paperUsage ~ recycle$recyclingBinPresent, center = median)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 5.3358 0.02627 *
## 39
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Neither the box-plots nor the Shapiro-Wilk Test allow us to assume normality holds, and the F-test rejects the null hypothesis of equal variance as well.
Since we have two groups here we are in the domain of the two-sample t-test but since the normality assumption is violated we have no choice but to try and normalize the data.
recycle$log.paper = log(recycle$paperUsage)
boxplot(recycle$log.paper ~ recycle$recyclingBinPresent, horizontal=TRUE)
with(recycle, tapply(log.paper, recyclingBinPresent, shapiro.test))
## $absent
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.85961, p-value = 0.005045
##
##
## $present
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.95117, p-value = 0.4136
var.test(recycle$log.paper, recycle$recyclingBinPresent)
## Warning in var(y): Calling var(x) on a factor x is deprecated and will become an error.
## Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.
##
## F test to compare two variances
##
## data: recycle$log.paper and recycle$recyclingBinPresent
## F = 1.7989, num df = 40, denom df = 40, p-value = 0.06689
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.9593389 3.3733860
## sample estimates:
## ratio of variances
## 1.79895
leveneTest(recycle$log.paper ~ recycle$recyclingBinPresent, center = mean)
## Levene's Test for Homogeneity of Variance (center = mean)
## Df F value Pr(>F)
## group 1 0.2523 0.6183
## 39
leveneTest(recycle$log.paper ~ recycle$recyclingBinPresent, center = median)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 0.385 0.5385
## 39
So the equal variances assumption is not violated but not so the normality assumption for the “absent” group. What about the square root transformation?
recycle$sr.paper = sqrt(recycle$paperUsage)
boxplot(recycle$sr.paper ~ recycle$recyclingBinPresent, horizontal=TRUE)
with(recycle, tapply(sr.paper, recyclingBinPresent, shapiro.test))
## $absent
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.88006, p-value = 0.01211
##
##
## $present
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.89752, p-value = 0.04388
var.test(recycle$sr.paper, recycle$recyclingBinPresent)
## Warning in var(y): Calling var(x) on a factor x is deprecated and will become an error.
## Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.
##
## F test to compare two variances
##
## data: recycle$sr.paper and recycle$recyclingBinPresent
## F = 5.8129, num df = 40, denom df = 40, p-value = 1.718e-07
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 3.099865 10.900257
## sample estimates:
## ratio of variances
## 5.812858
leveneTest(recycle$sr.paper ~ recycle$recyclingBinPresent, center = mean)
## Levene's Test for Homogeneity of Variance (center = mean)
## Df F value Pr(>F)
## group 1 4.0485 0.05115 .
## 39
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
leveneTest(recycle$sr.paper ~ recycle$recyclingBinPresent, center = median)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 3.1393 0.08424 .
## 39
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Neither transformation worked and so we will have to use a non-parametric test. Since we have two groups and the variances are unequal (of the original measurement) we should run the Kolmogorov-Smirnov test.
H0: Paper usage in the two groups comes from the same continuous distribution
HA: Paper usage in the two groups does not come from the same continuous distribution
ks.test(recycle$recyclingBinPresent, recycle$paperUsage, alternative="two.sided", conf.level = 0.95)
## Warning in ks.test(recycle$recyclingBinPresent, recycle$paperUsage,
## alternative = "two.sided", : cannot compute exact p-value with ties
##
## Two-sample Kolmogorov-Smirnov test
##
## data: recycle$recyclingBinPresent and recycle$paperUsage
## D = 1, p-value < 2.2e-16
## alternative hypothesis: two-sided
We have ties but we will take the test results at face-value. Given the low p-value we reject H0; paper usage is not the same in both groups.
Mosquitoes find their victims in part by odor, so it makes sense to wonder whether what we eat and drink influences our attractiveness to mosquitoes. A study in West Africa, working with the mosquito species that carry malaria, wondered whether drinking the local beer influenced attractiveness to mosquitoes.
They opened a container holding 50 mosquitoes next to each of 25 alcohol-free participants and measured the proportion of mosquitoes that left the container and flew toward the participants (they called this proportion “activation”). They repeated this procedure 15 minutes after each participant had consumed a liter of beer and measured the “change in activation” (after beer - before beer). This procedure was also carried out on another 18 participants who were given water instead of beer.
Without checking any of the underlying assumptions, use a t-test for differences between the mean changes in mosquito activation between the beer-drinking and water-drinking groups.
mosq = read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter12/chap12q16BeerAndMosquitoes.csv"))
head(mosq); names(mosq)
## drink beforeDrink afterDrink change
## 1 beer 0.13 0.49 0.36
## 2 beer 0.13 0.59 0.46
## 3 beer 0.21 0.27 0.06
## 4 beer 0.25 0.43 0.18
## 5 beer 0.25 0.50 0.25
## 6 beer 0.32 0.50 0.18
## [1] "drink" "beforeDrink" "afterDrink" "change"
We know this is a two-sample t-test.
H0: Mean change in activation is the same in both groups
HA: Mean change in activation is not the same in both groups
t.test(mosq$change ~ mosq$drink, alternative="two.sided", var.equal=TRUE)
##
## Two Sample t-test
##
## data: mosq$change by mosq$drink
## t = 3.1913, df = 41, p-value = 0.002717
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.05383517 0.23940928
## sample estimates:
## mean in group beer mean in group water
## 0.154400000 0.007777778
Given the p-value we can reject H0; the data suggest that mean change in activation is not the same in both groups.
Test for normality and equality of variances. Transform the data (if possible with the natural log or the square root). If neither transformation works, use the appropriate non-parametric test.
summary(mosq)
## drink beforeDrink afterDrink change
## beer :25 Min. :0.1200 Min. :0.0600 Min. :-0.28000
## water:18 1st Qu.:0.3100 1st Qu.:0.4150 1st Qu.:-0.03000
## Median :0.4400 Median :0.4900 Median : 0.10000
## Mean :0.4447 Mean :0.5377 Mean : 0.09302
## 3rd Qu.:0.5850 3rd Qu.:0.7000 3rd Qu.: 0.20500
## Max. :0.7900 Max. :1.0000 Max. : 0.46000
boxplot(mosq$change ~ mosq$drink, horizontal=TRUE)
tapply(mosq$change, mosq$drink, FUN = shapiro.test)
## $beer
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.96785, p-value = 0.5912
##
##
## $water
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.98162, p-value = 0.9652
var.test(mosq$change, mosq$drink)
## Warning in var(y): Calling var(x) on a factor x is deprecated and will become an error.
## Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.
##
## F test to compare two variances
##
## data: mosq$change and mosq$drink
## F = 0.10805, num df = 42, denom df = 42, p-value = 4.511e-11
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.05852226 0.19947615
## sample estimates:
## ratio of variances
## 0.1080453
library(car)
leveneTest(mosq$change ~ mosq$drink, center = mean)
## Levene's Test for Homogeneity of Variance (center = mean)
## Df F value Pr(>F)
## group 1 1.3689 0.2488
## 41
leveneTest(mosq$change ~ mosq$drink, center = median)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 1.0519 0.3111
## 41
The Shapiro-Wilk test says normality may not be violated but the variance tests give us conflicting results. The F-test says the distributions are not coming from populations with equal variances but Levene’s test disagrees with this conclusion. What shall we do?
Well, since at least one of the variance tests is signalling a problem with equal variances AND both sub-samples are small (n<30). We might as well run the t-test with unequal variances. We do so below.
t.test(mosq$change ~ mosq$drink, alternative="two.sided", var.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: mosq$change by mosq$drink
## t = 3.3219, df = 40.663, p-value = 0.001897
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.05746134 0.23578311
## sample estimates:
## mean in group beer mean in group water
## 0.154400000 0.007777778
Same conclusion as before; mean change in activation differs across the two groups.
Assume normality is rejected, the variances were indeed equal (i.e., that Levene’s test was spot-on with its conclusion), and no transformation will work. Now run the appropriate non-parametric test.
The test will be the Mann-Whitney U test
H0: Change in activation is drawn from the same population for both groups
HA: Change in activation is not drawn from the same population for both groups
wilcox.test(mosq$change ~ mosq$drink, paired=FALSE)
## Warning in wilcox.test.default(x = c(0.36, 0.46, 0.06, 0.18, 0.25, 0.18, :
## cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: mosq$change by mosq$drink
## W = 343.5, p-value = 0.003661
## alternative hypothesis: true location shift is not equal to 0
library(exactRankTests)
wilcox.exact(mosq$change ~ mosq$drink, paired=FALSE)
##
## Exact Wilcoxon rank sum test
##
## data: mosq$change by mosq$drink
## W = 343.5, p-value = 0.002899
## alternative hypothesis: true mu is not equal to 0
Again, we can reject H0. Different populations characterize change in activation of each group.
Note that the presence of ties forced us to use the Exact Wilcoxon Rank Sum test.
Assume normality is rejected, variances are unequal, and no transformation will work. Use an appropriate non-parametric test.
This will have to be the Kolmogorov-Smirnov test.
H0: Change in activation is drawn from the same population for both groups
HA: Change in activation is not drawn from the same population for both groups
ks.test(mosq$drink, mosq$change)
## Warning in ks.test(mosq$drink, mosq$change): cannot compute exact p-value
## with ties
##
## Two-sample Kolmogorov-Smirnov test
##
## data: mosq$drink and mosq$change
## D = 1, p-value < 2.2e-16
## alternative hypothesis: two-sided
The p-value is practically 0 so we can safely reject H0; change in activation is drawn from different populations for each group