Sabina’s Stats Corner: Paired? Or not paired? – Women's Health Research Institute

On behalf of Sabina Dobrer

Recently I asked about options for comparing categorical and continuous variables. What to use? Paired? Not paired? So I decided to put this information together to provide some guidelines for the researchers and their teams. In this story I will describe appropriate tests to compare continuous variables.

Paired or Not?

Paired test is used to compare two samples when each individual in one sample also appears in the other sample.

Unpaired test is used to compare two samples when each individual in one sample is independent of every individual in the other sample

Types of tests

There are two types of test widely used: parametric and non- parametric

I personally prefer non-parametric tests for most of the situations, as there is no need to check for the assumptions.

Parametric tests are those that make assumptions about the parameters of the population distribution from which the sample is drawn. This is often the assumption that the population data are normally distributed

Non-parametric tests are “distribution-free” and, as such, can be used for non-normal variables.

Parametric	Non-Parametric
Paired t-test	Wilcoxon signed rank test
Unpaired t-test	Mann-Whitney U test or Wilcoxon rank-sum test
Pearson correlation	Spearman correlation
One way analysis of variance (ANOVA)	Kruskal Wallis Test

Brief description

Paired t-test: is a method used to test whether the mean differences between pairs of measurements is zero or not.

What you need to test/know in order to use the paired t-test:

Subjects are independent. For example, you would like to compare student performance on the exam before and after the intervention. In this case we assume that each student does their own work on both exams
Each of the paired measures are obtained from the same subject
The distribution of differences in means is normally distributed
The variance for the pair t-test the variance is not assumed to be equal

Note: the normality assumption is more important for the small sample sizes, as based on the Central Limit Theorem when sample size tends to infinity, the sample means approaches the normal distribution. What is large sample condition? In most of the books 30 is commonly used for the “large enough” sample. As for me, I always check all the assumptions for any test.

Wilcoxon signed rank test: is a non-parametric method used to test whether the median of pairs of measurements is zero or not.

What you need to test/know in order to use Wilcoxon signed rank test:

Subjects are independent
Each of the paired measures are obtained from the same subject
Measures are continuous in theoretical nature
Ordinal level of measurement to ensure two values can be compared

Unpaired t-test: designed to compare the means of two independent or unrelated samples.

What you need to test for in order to use Unpaired t-test:

The variance between two groups assumed to be equal
The data is continuous
Only two groups are compared
Groups are independent
Data should be normally distributed

Mann Whitney U test (Wilcoxon Rank Sum Test): test equality of means of two independent samples or whatever two samples are likely to derive from the same population. You also can interpret the test as comparing medians between two samples.

What you need to test/know in order to use Mann Whitney U test:

Dependent variable is continuous and ordinal
Independent variable is categorical with two categories (defines two samples you would like to compare)
Samples are independent
In order to interpret the results of this test you need to understand if distribution shape of two samples are the same or different. If the two distributions have a different shape you can only use this test to compare mean ranks

One Way ANOVA: compares the means of two or more independent groups in order to determine whether there is statistical evidence that the associated population means are significantly different. One-Way ANOVA is a parametric test.

What you need to test/know for in order to use One-Way ANOVA

Normality – that each sample is taken from a normally distributed population
Sample independence – that each sample has been drawn independently of the other samples
Variance equality – that the variance of data in the different groups should be the same
Measures are continuous in theoretical nature

Kruskal Wallis Test (one way analysis of variance) and test whether the samples originate from the same distribution. This is an extension of the Mann Whitney U test where independent variable has more than 2 categories (comparing more than 2 samples). A significant Kruskal Wallis test indicates that at least one sample stochastically dominates the other sample. Note: stochastically means in random. We mostly use it to determine if there are statistical difference between two or more groups of independent variable on a continuous or ordinal dependent variable.

What you need to test/know for in order to use Kruskal Wallis Test

Dependent variable is continuous and ordinal
Independent variable is categorical and can include 3 and more categories
Observations must be independent
In order to interpret the results of this test you need to understand if distribution of each groups are the same or different. Same as for Mann Whitney U test if distributions have a different shape you can only use this test to compare mean ranks

Pearson correlation: is the most common parametric way of measuring a linear correlation. It is a number between –1 and 1 that measures the strength and direction of the relationship between two variables.

What you need to test/know in order to use Pearson correlation

Both variables are quantitative
The variables are normally distributed
The data have no outliers
The relationship is linear and relationship between the two variables can be described reasonably well by a straight line

Spearman: non-parametric alternative to Pearson correlation.

What you need to test/know in order to use Spearman correlation:

The variables are ordinal
The variables aren’t normally distributed
The data includes outliers.
The relationship between the variables is non-linear and monotonic

I hope this helps. You can listen to my previous Lunch & Learn lectures in online format here WHRI Lunch & Learn Series – Women’s Health Research Institute.

Go to our Stats corner in eBlast for previously published tips in data management and analysis eBlast Archive – Women’s Health Research Institute (whri.org).