Which Statistical Test should I use? An Overview

Posted: 30 Jan 2022
Author: Veronica White

updated: 2/25/22

Collaborative work with organizations often involves analyzing whatever data is available. These analyses can often lead to statistical testing to help understand and communicate what the data tells you about the organization’s processes or programs. With so many statistical tests and methods out there, it’s essential to choose the right one(s) for your study.

Before you start, consider how these frequentist statistics (i.e., statistical methods that estimate p-values) play a role in your study. The American Statistician Association has written two statements that highlight the concerns, considerations, and alternative/supplementary approaches when conducting scientific studies that utilize frequentist statistics (see [1], [2]).

Suppose you have decided to go ahead and use frequentist statistics and, more specifically, hypothesis testing for your study. In that case, the next natural question is, what test do you use? To choose a test, you must describe both the purpose of the test and the data you are analyzing, which are outlined as the following:

  1. Defining the test’s purpose (i.e., what is your null hypothesis).
    • How many groups?: Can your data be grouped into 1, 2, or >3 distinct groups.
      • A single group involves comparing your entire dataset to some external standard. For example, consider you wish to compare the length of stay on average length for knee surgery at a specific hospital to the clinically expected mean length of stay time. This example would involve comparing the single mean of your sample data to a known population mean value.
      • Two groups could involve a control group and an intervention group. Alternatively, the two groups could be defined by two specific categories your data falls into (e.g., people who drink coffee every day and those who don’t).
      • Comparing data across three or more groups could involve a control group and multiple intervention groups. Alternatively, it could be three or more specific categories your data falls into (e.g., broadway musical, off-broadway musicals, off-off-broadway musicals, etc.)
    • Are the groups independent or paired?: If you have two or more groups, you will need to determine if your data is paired/matched or independent. Independent data assumes that each point in your dataset does not depend on other observations or points in your dataset. Paired or matched data typically involves where each data point, a single group, is considered directly comparable to a data point in each of the other distinct groups. An example of paired data is measuring the heights of individuals before and after six months. An example of matched data would be observing differences in California vs. Texas cities, which were paired up based on similar population sizes.
  2. Describing your data
    • Continuous, discrete, ordinal, vs. categorical data: The outcome variable that you wish to compare might be continuous (e.g., lbs/oz), discrete (e.g., number of children), ordinal (e.g., Likert scale), or categorical (e.g., shoe brands) (see [3] for a description of data types).
    • parametric vs. non-parametric: The resulting outcome variable may follow a normal distribution (i.e., parametric) or a non-parametric distribution. In other words, are you assuming your data follows a normal distribution? If yes, it is best to support this assumption through the use of a visual inspection followed by a Shapiro-Wilk test (see [4]) for more information). If your data fails normality testing, you can use a non-parametric test.
      • Can I always use a non-parametric test and skip checking for normality? Short answer, you can, but it will result in higher p-values than if you used a parametric test. Long answer, see [5]

Now that both the test’s purpose and data are well defined, you are ready to choose a test—table 1 summarizes when to use various hypothesis tests. Additional information on choosing a statistical test and on the various statistical tests can be found in ([6],[7]).

Each test can be implemented using various software such as SAS, R, SPSS, and STATA. See [10] for examples of implementing the various tests in your desired software. Are we all done? Not quite, re-read [2] and read [11] for interpretation and best practices of reporting p-values.

Table 1: Summary of Hypothesis testing, Purpose of test vs. Characteristics of Outcome Variables. (Adapted from [6])

Purpose of Test Continuous and normal data Continuous, non-normal data OR non-continuous, discrete or ordinal data Non-continuous, categorical data
Compare 1 mean with a population value One sample z-test/t-testa one-sample median exact binomial test
Compare 2 independent groups Independent samples z-test/ t-testa Mann-Whitney U/ Wilcoxon Sum of ranksb Chi-squared test or Fisher’s exact testc
Compare 2 paired groups Paired t-test Wilcoxon signed ranks test/ sign test McNemar’s test
Compare 3 or more independent groups One-way Analysis of Variance Kruskal Wallis test Chi-squared test
Compare 3 or more paired groups Repeated measures Analysis of variance Friedman test Cochrane Q

footnotes:
a: If the sample size is small (e.g., n < 30), use a t-test.
b: The Mann-Whitney U test is the same as conducting the Wilcoxon Sum of ranks test, see [8]
c: See [9] for a disscussion on using a chi squred vs fisher test https://www.datascienceblog.net/post/statistical_test/contingency_table_tests/

References

[1] R. L. Wasserstein and N. A. Lazar, “The ASA Statement on p-Values: Context, Process, and Purpose,” The American Statistician, vol. 70, no. 2, pp. 129–133, Apr. 2016, doi: 10.1080/00031305.2016.1154108.
[2] R. L. Wasserstein, A. L. Schirm, and N. A. Lazar, “Moving to a World Beyond ‘p < 0.05,’” The American Statistician, vol. 73, no. sup1, pp. 1–19, Mar. 2019, doi: 10.1080/00031305.2019.1583913.
[3] “Types of Statistical Data: Numerical, Categorical, and Ordinal,” dummies. https://www.dummies.com/article/academics-the-arts/math/statistics/types-of-statistical-data-numerical-categorical-and-ordinal-169735 (accessed Feb. 07, 2022).
[4] “Which statistical test should you use? | XLSTAT Support Center.” https://help.xlstat.com/s/article/which-statistical-test-should-you-use?language=en_US (accessed Feb. 07, 2022).
[5]“Parametric and Non-parametric tests for comparing two or more groups | Health Knowledge.” https://www.healthknowledge.org.uk/public-health-textbook/research-methods/1b-statistical-methods/parametric-nonparametric-tests (accessed Feb. 07, 2022).
[6] Wills, A. Research Methods and Statistics. Online Course Acessed 2/2/2022.http://www.bristol.ac.uk/medical-school/media/rms/red/which_test.html
[7]A. Ghasemi, and S. Zahediasl. “Normality tests for statistical analysis: a guide for non-statisticians.” International journal of endocrinology and metabolism 10, no. 2 (2012): 486. doi: 10.5812/ijem.3505
[8] “Mann–Whitney U test,” Wikipedia. Jan. 31, 2022. Accessed: Feb. 07, 2022. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Mann%E2%80%93Whitney_U_test&oldid=1069150075
[9] “Testing Independence: Chi-Squared vs Fisher’s Exact Test,” Oct. 17, 2018. https://www.datascienceblog.net/post/statistical_test/contingency_table_tests/ (accessed Feb. 07, 2022).
[10] “Choosing the Correct Statistical Test in SAS, Stata, SPSS and R.” https://stats.oarc.ucla.edu/other/mult-pkg/whatstat/ (accessed Feb. 25, 2022).
[11] J. Storopoli. “Bayesian Statistics with Julia and Turing”. p-values. 2021. https://storopoli.io/Bayesian-Julia/pages/2_bayes_stats/#p-values