As a data scientist, you would do well to understand some statistics. After all, it is one of the building blocks of the field.
This is the first article in a series which will attempt to give a concise, practical overview of different statistical tests and the situations in which they should be used. The information will strike a balance – not so long and technical as to be inaccessible, but not so short as to be useless.
In this first article, I’ll be talking about one of the more widely known statistical tests out there: the t-test. Even if you don’t quite know what it is, it’s likely you’ve heard the term thrown around. By the end of this article, you should understand how the test works and when you should use it.
If you’re generally unfamiliar with statistical test design, I highly recommend checking out the prefatory article to this series, A Primer on Foundational Concepts You Need to Start Running Statistical Tests.
Now then, let’s begin.
What is the t-test?
At its core, the t-test is used to compare data from two **** different samples of data. It takes two factors into account: the difference between the sample means and the variability of the sample data. Logically, this makes sense – simply comparing means is not sufficient to determine that your samples are actually different (means could very well differ with similar data sets, or be similar with very different data sets).
The t-test calculates a statistic known as the t-score, which takes both of the above factors into account. It can be calculated manually using a mildly annoying formula, but pretty much any statistical software will calculate it for you automatically. If you calculate the statistic manually, you can compare it to a critical t-value from preexisting tables (being greater than the t-value corresponds to the p-value being less than the desired level for your test, indicating a statistically significant difference between samples). Once again, statistical software will generally just tell you whether or not the calculated t-value is statistically significant.
The process for conducting this test varies from software to software, but most are straightforward to pick up with the software you choose to learn. The details of these calculations are omitted here, as the purpose of this article is to teach you what the t-test is and in what situations you should use it.
In line with that, let’s consider a hypothetical experiment in which you might use a t-test. Imagine you are a user researcher for a sports equipment company that is developing a pair of shoes for basketball players. The company’s engineers have two designs and want to know if there is a meaningful difference in a player’s jumping ability depending on the design.
To test this, you obtain 40 basketball players (via a random sample), giving 20 of them one pair of shoes (Group A) and the other 20 the second pair of shoes (Group B). The assignments to the two groups are also randomized. You then measure the vertical of each player when wearing the shoes and record the data in two sets, separated by the type of shoe.
To determine if there is a statistically significant difference between the vertical measurements of the two groups, you could use a t-test.
I’ve heard there are two types of t-tests – is this true?
Ah, so you’re one step ahead of the game. Technically, the test we just talked about is called an independent-samples t-test. Why? Because the two means being compared are from different samples. This kind of t-test is generally used when our experiment uses a between-subjects design. Since the participants are different for each condition, the samples are considered independent.
This same experiment could also be conducted using a within-subjects design. In this case, we would only collect 20 participants, and we would measure each participant’s vertical while wearing each shoe. One advantage of this is that it eliminates individual differences that could provide a false result (What if, despite the random samples and assignment, the players in one group simply just happened to be able to jump higher?).
In any case, this structure renders the independent-samples t-test inappropriate; instead, we would use what is called a paired-samples t-test. This name comes from the fact that because each participant appears in both data sets, they are in a sense paired.
When you are choosing a statistical test to run for an experiment, be sure to to take this into account.
But, what about those pesky assumptions?
Indeed, both versions of the t-test do come with a set of assumptions that must be met for the test to be valid:
- The samples must be randomly selected.
- The sample data must be of interval or ratio type (i.e., the data should be quantitative).
- The populations from which the samples are drawn should be approximately normally distributed. This cannot be known for certain, but you can generally settle for an educated guess based on the sample distributions.
- The two samples should have similar standard deviations (the spread of the samples shouldn’t be too different).
And so, what should you do if one or more requirements are not met? Luckily, this is precisely where nonparametric alternatives come in. Depending on which t-test you intended to use, you should now shift gears:
- The nonparametric version of the independent-samples t-test is known as the Mann-Whitney U-Test.
- The nonparametric version of the paired-samples t-test is known as the Wilcoxon Signed-Rank Test.
Although they are perhaps less known to the public, you should be able to find these tests in your statistical software of choice without too much trouble; the important thing is just that you know when to use them.
Quick Recap and Final Thoughts
Let’s quickly run through some questions that can help you identify when you should use a t-test.
Does your experiment have one factor/independent variable (the t-test is not suitable for experiments with multiple factors)?
Are there two treatments/conditions that are being tested for the factor?
Assuming an affirmative answer to the above questions, does your experiment use a between-subjects or a within-subjects design? And does your experiment meet the conditions for using the parametric t-test?
- Between-subjects and parametric: independent-samples t-test
- Between-subjects and nonparametric: Mann-Whitney U-Test
- Within-subjects and parametric: paired-samples t-test
- Within-subjects and nonparametric: Wilcoxon signed-rank test
And there you have it! Follow this workflow in your next experiment, and you’ll be well on your way to mastering t-tests.
Happy testing!
Want to excel at Python? Get exclusive, free access to my simple and easy-to-read guides here. Want to read unlimited stories on Medium? Sign up with my referral link below!
References
Vaughan, L. (2001). Statistical Methods for the Information Professional. Medford, NJ: ASIS&T Press, pp. 139–155.