Statistics uses tools from probability and data analysis to
draw inferences about populations from samples
quantify uncertainty (or confidence) about an inference
We’ll illustrate inferential principles in the setting of
estimating average characteristics of a well-defined population
drawing conclusions about that characteristic
Perspectives on statistical inference
Some pithy philosophy on statistics
Statisticians are “engaged in an exhausting but exhilarating struggle with the biggest challenge that philosophy makes to science: how do we translate information into knowledge?” –Senn (2022)
“Statistics is the art of making numerical conjectures about puzzling questions.” –Freedman, Pisani, and Purves (2007)
“Statistics is an area of science concerned with the extraction of information from numerical data and its use in making inferences about a population from which the data are obtained.” –Mendenhall, Beaver, and Beaver (2012)
“The objective of statistics is to make inferences (predictions, decisions) about a population based on information contained in a sample.” –Mendenhall, Beaver, and Beaver (2012)
Example: The YRBSS survey data
Consider the Youth Risk Factor Behavior Surveillance System (YRBSS), a survey conducted by the CDC to measure health-related activity in high school aged youth. The YRBSS data contain
2.6 million high school students, who participated between 1991 and 2013 across more than 1,100 separate surveys.
Dataset yrbss in the oibiostatR package contains the responses from the 13,583 participants from the year 2013.
The CDC used 13,572 students’ responses to estimate health behaviors in a target population: 21.2 million high school-aged students in US in 2013.
Of populations and parameters
The mean weight among the 21.2 million students is an example of a population parameter, i.e., .
The mean within a sample (e.g., as with the 13,572 students in YRBSS), is a point estimate of a population parameter.
Estimating the population mean weight from the sample of 13,572 participants is an example of statistical inference.
Why inference? It is too tedious to gather this information for all 21.2 million students—also, it is unnecessary.
Of populations and parameters
In nearly all studies, there is one target population and one sample.
Suppose a different random sample (of the same size) were taken from the same population—different participants, different .
Sampling variability describes the degree to which a point estimate varies from sample to sample (assuming fixed sampling scheme).
Properties of sampling variability (randomness) allow for us to account for its effect on estimates based on a sample.
Sampling from a population
The exact values of population parameters are unknown.
To what degree does sampling variability affect as an estimate of .
As an example, let the YRBSS data (of 13,572 individuals) be the target population, with mean weight .
Sample from this population (e.g., ) and calculate , the mean weight among the sampled individuals.
How well does estimate ?
Take many samples to construct the sampling distribution of .
Taking samples from the YRBSS data
The estimator is a random variable—randomness from sampling.
The sample mean as a random variable
The statistic is a random variable for which
The sampling distribution of is centered around because (consistency, strong law of large numbers).
The variability of becomes smaller with larger sample size, .
Any sample statistic is a random variable since each sample drawn from the population ought to be different.
When the data have not yet been observed, the statistic, like the corresponding RV, is a function of the same random elements.
The standard error of
If could be observed through repeated sampling, its standard deviation would be (n.b., )
The variability of a sample mean decreases as sample size increases: characterizes that behavior more precisely.
Typically, is unknown and estimated by .
The term is called the standard error of .
Leveraging variability: Confidence intervals
A confidence interval gives a plausible range of values for the population parameter, coupling an estimate and a margin of error:
Confidence intervals: Definition and construction
A confidence interval with coverage rate for a population mean is any random interval such that . A common form is where the margin of error, , draws on the sampling variability of .
Since , the margin of error may be based on the properties (e.g., quantiles) of the normal distribution.
Confidence intervals: The fine print…
The confidence level may also be called the confidence coefficient. Confidence intervals have a nuanced interpretation:
The method for computing a 95% confidence interval produces a random interval that—on average—contains the true (target) population parameter 95 times out of 100.
5 out of 100 will be incorrect, but, of course, a data analyst (or reader of your paper!) cannot know whether a particular interval contains the true population parameter.
The data used to calculate the confidence interval are from a random sample taken from a well-defined target population.
Confidence intervals: Proof by picture
Asymptotic (100)% CI
Random interval is a (100)% CI if
The Distribution
The distribution1 is symmetric, bell-shaped, and centered at zero; it is like a standard normal distribution , almost…
It has an additional parameter–degrees of freedom ( or ).
Degrees of freedom () equals .
The distribution’s tails are thicker than those of a normal.
When is “large” (), the and distributions are virtually identical (technically, only when ).
The Distribution…
A ()(100)% confidence interval
A ()(100)% confidence interval (CI) for a population mean based on a single sample with mean is
where is the quantile of a distribution (with df) for which there is area to its left.
For a 95% CI, find with 0.975 area to its left (or, equivalently, 0.025 area to its right).
Calculating the critical -value,
The R function qt(p, df) finds the quantile of a distribution with df degrees of freedom that has area to its left.
for a 95% confidence interval where is 2.262.
qt(0.975, df =9)
[1] 2.262157
Just let R do the work for you…95% CI from t.test():
If a ()(100)% confidence interval for a population mean does not contain a hypothesized value , then:
the observed data contradict the null hypothesis (at a given significance level , e.g., )
the implied two-sided alternative hypothesis is
Null and alternative hypotheses
The null hypothesis () posits a distribution for the population reflecting no change from the past, e.g., .
The alternative hypothesis () claims a “real” difference exists between the distribution of the observed data (a sample) and the distribution implied by the null hypothesis.1
Since is an alternative claim, it is often represented by a range of possible parameter values, e.g., .
A hypothesis test evaluates whether there is evidence against based on the observed data, using a test statistic.
Framing null and alternative hypotheses
For our BMI inquiry, there are a few possible choices for and . To simplify and demonstrate, let’s use
, the midpoint of the healthy range
The form of above is a one-sided alternative. One could also write a two-sided alternative, .
The choice of one- or two-sided alternative is context-dependent and should be driven by the motivating scientific question.
What’s “real”? The significance level
The significance level quantifies how rare or unlikely an event must be in order to represent sufficient evidence against .
In other words, it is a bar for the degree of evidence necessary for a difference to be considered as “real” (or significant)12.
In the context of decision errors, is the probability of committing a Type I error (incorrectly rejecting when it is true).
Choose (and calculate) a test statistic
The test statistic measures the discrepancy between the observed data and what would be expected if the null hypothesis were true.
When testing hypotheses about a mean, a valid test statistic is where the test statistic follows a distribution with .
The devil is in the details
We will go on to talk about a few more practical versions of the -test (e.g., 2-sample -test, 1-sample -test of paired differences).
In each of these cases, some assumptions are required…
measurements being compared (e.g., BMI) are randomly (iid) sampled from a normal distribution
same unknown mean, same unknown variance (plus normality)
Justifying your assumptions is the hardest part
Given the context of your scientific problem, are these assumptions true or reasonable?
Calculate a -value…and what is it anyway?
What is the probability that we would observe a result as or more extreme than the observed sample value, if the null hypothesis is true? This probability is the -value1.
Calculate the -value associated with the test statistic and then compare it to the pre-specified significance level .
A result is considered unusual (or statistically significant) if its associated -value is less than .
Quantifying surprise – the -value
Despite their popularity, -values are notoriously hard to interpret1. Rafi and Greenland (2020)’s -values (“binary surprisal value”) are a cognitive tool for interpreting and understanding -values2.
The surprisal value for interpreting -values
The -value is defined via as , where is a -value.
quantifies the degree of surprise associated with experiencing a similar result (as the given -value) when evaluating if a coin is fair.
The -value for a two-sided alternative
For a two-sided alternative, , the -value of a t-test is the total area from both tails of the t distribution that are beyond the absolute value of the observed t statistic:
The -value for a one-sided alternative
For a one-sided alternative, the -value is the area in the tail of the t distribution that matches the direction of the alternative.
For :
For :
The -value and drawing conclusions
The smaller the -value, the stronger the evidence against .
If the -value is as small or smaller than , we reject; the result is statistically significant at level .
If the -value is larger than , we fail to reject1; the result is not statistically significant at level —that is, the evidence we have available does not contradict .
Always state conclusions in the context of the research problem.
Evaluating BMI in the NHANES adult sample
Question: Do Americans tend to be overweight?
# use R as a calculator...x_bar <-mean(nhanes.samp.adult$BMI)mu_0 <-21.7s <-sd(nhanes.samp.adult$BMI)n <-length(nhanes.samp.adult$BMI)(t <- (x_bar - mu_0) / (s /sqrt(n)))
[1] 11.38311
pt(t, df = n -1, lower.tail =FALSE)
[1] 1.006759e-21
# just let R do the work...t.test( nhanes.samp.adult$BMI, mu =21.7,alternative ="greater")
One Sample t-test
data: nhanes.samp.adult$BMI
t = 11.383, df = 134, p-value < 2.2e-16
alternative hypothesis: true mean is greater than 21.7
95 percent confidence interval:
28.02288 Inf
sample estimates:
mean of x
29.09956
The Kolmogorov-Smirnov (KS) test
The KS test is a nonparametric test that evaluates the equality of two distributions (n.b., different from testing mean differences).
The empirical cumulative distribution function (eCDF) is
The KS test uses as its test statistic 1, where is a theoretical (i.e., assumed) CDF.
Using the KS test: Is BMI normally distributed?
Applying the KS test evaluates evidence against of BMI (in NHANES population) arising from a normal distribution:
Asymptotic one-sample Kolmogorov-Smirnov test
data: bmi_zstd
D = 0.09895, p-value = 0.1422
alternative hypothesis: two-sided
References
Cole, Stephen R, Jessie K Edwards, and Sander Greenland. 2021. “Surprise!”American Journal of Epidemiology 190 (2): 191–93. https://doi.org/10.1093/aje/kwaa136.
Fisher, Ronald Aylmer. 1926. “The Arrangement of Field Experiments.”Journal of the Ministry of Agriculture of Great Britain 33: 503–13.
Freedman, David A, Robert Pisani, and Roger Purves. 2007. Statistics. W. W. Norton & Company.
Mendenhall, William, Robert J Beaver, and Barbara M Beaver. 2012. Introduction to Probability and Statistics. Cengage Learning.
Rafi, Zad, and Sander Greenland. 2020. “Semantic and Cognitive Tools to Aid Statistical Science: Replace Confidence and Significance by Compatibility and Surprise.”BMC Medical Research Methodology 20: 244. https://doi.org/10.1186/s12874-020-01105-9.