Biostatistics in Clinical Research

Expert-defined terms from the Advanced Certificate in Clinical Research course at London School of Planning and Management. Free to read, free to share, paired with a professional course.

Download PDF Free · printable · SEO-indexed
Biostatistics in Clinical Research

Absolute Risk Reduction (ARR) #

Absolute Risk Reduction (ARR)

Definition #

The difference in event rates between a control group and an experimental group, expressed as a proportion.

Example #

If 10 % of patients in the control arm experience a heart attack versus 6 % in the treatment arm, ARR = 0.10 − 0.06 = 0.04 (4 %).

Practical application #

Used to convey the clinical impact of an intervention and to calculate the number needed to treat (NNT = 1/ARR).

Challenges #

Requires accurate event rates; small sample sizes can produce unstable ARR estimates and wide confidence intervals.

Adjusted Hazard Ratio (aHR) #

Adjusted Hazard Ratio (aHR)

Definition #

A hazard ratio derived from a Cox proportional hazards model that includes covariates to control for confounding variables.

Example #

In a cancer trial, the aHR for death after adjusting for age, stage, and performance status might be 0.75, indicating a 25 % reduction in hazard after adjustment.

Practical application #

Provides a more realistic estimate of treatment effect when baseline characteristics differ between groups.

Challenges #

Model misspecification, violation of proportional hazards assumption, and multicollinearity among covariates can bias the aHR.

Analysis of Covariance (ANCOVA) #

Analysis of Covariance (ANCOVA)

Definition #

A statistical technique that combines analysis of variance with linear regression to compare group means while adjusting for continuous covariates.

Example #

Comparing post‑treatment blood pressure between two drug groups while adjusting for baseline blood pressure.

Practical application #

Increases statistical power by reducing residual variance and controlling for baseline imbalances.

Challenges #

Requires linear relationship between covariate and outcome, homogeneity of regression slopes, and careful selection of covariates to avoid over‑adjustment.

Attrition Bias #

Attrition Bias

Definition #

Systematic differences between participants who complete a study and those who withdraw, potentially distorting results.

Example #

If sicker patients are more likely to drop out, the remaining sample may appear healthier than the true population.

Practical application #

Recognized during trial design by planning strategies such as intention‑to‑treat analysis and robust follow‑up procedures.

Challenges #

Quantifying the bias is difficult; high attrition rates (>20 %) often necessitate sensitivity analyses.

Baseline Characteristics #

Baseline Characteristics

Definition #

Demographic and clinical variables measured before randomization, used to assess group comparability.

Example #

Age, sex, disease severity, and prior therapies recorded at enrollment.

Practical application #

Inform stratified randomization schemes and serve as adjustment variables in multivariable models.

Challenges #

Imbalance may occur by chance; over‑adjustment can reduce precision.

Bayesian Inference #

Bayesian Inference

Definition #

A statistical paradigm that updates prior beliefs with observed data to obtain a posterior distribution for parameters of interest.

Example #

Using a prior distribution for treatment effect based on earlier phase II data and combining it with phase III results to produce a posterior estimate.

Practical application #

Facilitates adaptive trial designs, interim monitoring, and decision‑making under uncertainty.

Challenges #

Choice of prior can be subjective; computationally intensive for complex models.

Binomial Distribution #

Binomial Distribution

Definition #

Probability distribution describing the number of successes in a fixed number of independent yes/no trials with constant success probability.

Example #

Number of patients achieving tumor response out of 50 treated individuals.

Practical application #

Basis for confidence interval calculations for proportions and for exact tests (e.g., Fisher’s exact test).

Challenges #

Assumes independence; violations occur with clustered or longitudinal data.

Censoring #

Censoring

Definition #

Incomplete observation of an event time, where the exact time of occurrence is unknown beyond a certain point.

Example #

A patient who is still alive at study end is right‑censored at that time.

Practical application #

Handled using Kaplan‑Meier estimator and Cox models to incorporate all available information.

Challenges #

Informative censoring can bias estimates if the censoring mechanism is related to the outcome.

Confidence Interval (CI) #

Confidence Interval (CI)

Definition #

A range of values constructed from sample data that, with a specified confidence level (typically 95 %), is expected to contain the true population parameter.

Example #

A 95 % CI for a mean difference of 2.5 mg/dL might be (1.0, 4.0).

Practical application #

Provides information about precision and statistical significance; intervals that exclude the null value imply significance.

Challenges #

Misinterpretation as probability that the true value lies within the interval; dependence on sample size and variance.

Cox Proportional Hazards Model #

Cox Proportional Hazards Model

Definition #

A semiparametric regression model that estimates the effect of covariates on the hazard function without specifying the baseline hazard.

Example #

Modeling time to disease progression while adjusting for treatment, age, and biomarker status.

Practical application #

Generates adjusted hazard ratios for multiple predictors in time‑to‑event analyses.

Challenges #

Requires proportional hazards assumption; violation necessitates stratified models or time‑dependent covariates.

Cross‑Over Design #

Cross‑Over Design

Definition #

A clinical trial where each participant receives multiple interventions sequentially, serving as his/her own control.

Example #

Patients receive Drug A for eight weeks, undergo a two‑week washout, then receive Drug B for eight weeks.

Practical application #

Increases efficiency and reduces variability, especially for chronic stable conditions.

Challenges #

Carry‑over effects, appropriate washout duration, and ethical concerns when disease progression is rapid.

Data Monitoring Committee (DMC) #

Data Monitoring Committee (DMC)

Definition #

An independent group of experts tasked with reviewing accumulating trial data for safety, efficacy, and integrity.

Example #

The DMC recommends early termination of a trial because of overwhelming benefit.

Practical application #

Ensures participant protection and objective decision‑making during a study.

Challenges #

Maintaining confidentiality, avoiding operational bias, and defining stopping rules a priori.

Effect Size #

Effect Size

Definition #

A quantitative measure of the magnitude of a treatment effect, independent of sample size.

Example #

A Cohen’s d of 0.8 indicates a large effect of the intervention on depression scores.

Practical application #

Guides sample‑size calculations and facilitates meta‑analysis across studies.

Challenges #

Selection of appropriate metric; effect sizes can be inflated in small, underpowered studies.

Endpoint #

Endpoint

Definition #

The specific event or measurement used to assess the efficacy of an intervention.

Example #

Overall survival, progression‑free survival, or change in HbA1c.

Practical application #

Determines statistical analysis plan and regulatory approval criteria.

Challenges #

Choosing clinically meaningful endpoints versus feasible surrogate markers; endpoint adjudication may be resource‑intensive.

Enrollment #

Enrollment

Definition #

The process of enrolling eligible participants into a clinical trial.

Example #

A multicenter oncology study enrolls 500 patients over 12 months.

Practical application #

Impacts study timelines, power, and budget; strategies include site selection and outreach.

Challenges #

Slow accrual, competition with other trials, and stringent eligibility criteria.

Epidemiologic Measures #

Epidemiologic Measures

Definition #

Quantitative descriptors of disease occurrence in a defined population.

Example #

Incidence rate of 5 cases per 1,000 person‑years for a rare disease.

Practical application #

Provides baseline risk estimates for sample‑size calculations and contextualizes trial results.

Challenges #

Accurate denominator determination and accounting for under‑reporting.

Exponential Distribution #

Exponential Distribution

Definition #

A continuous probability distribution often used to model time between events in a Poisson process, characterized by a constant hazard rate.

Example #

Modeling time to equipment failure in a clinical laboratory.

Practical application #

Serves as a simple parametric alternative to non‑parametric survival methods.

Challenges #

Assumes constant hazard, which is rarely true for disease progression.

Fisher’s Exact Test #

Fisher’s Exact Test

Definition #

A statistical test that calculates the exact probability of observing a particular set of frequencies in a 2 × 2 table, regardless of sample size.

Example #

Comparing adverse event rates (5/30 vs 12/30) between two treatment arms.

Practical application #

Preferred when expected cell counts are <5.

Challenges #

Computationally intensive for larger tables; interpretation identical to chi‑square when sample size is large.

Hazard Ratio (HR) #

Hazard Ratio (HR)

Definition #

The ratio of hazard rates between two groups at any point in time, derived from survival analysis.

Example #

An HR of 0.65 indicates a 35 % reduction in hazard for the treatment group compared with control.

Practical application #

Commonly reported in oncology trials to quantify treatment benefit.

Challenges #

Requires proportional hazards; non‑proportionality leads to misleading single‑value HRs.

Intention‑to‑Treat (ITT) Principle #

Intention‑to‑Treat (ITT) Principle

Definition #

An analysis strategy that includes all randomized participants in the groups to which they were assigned, regardless of adherence.

Example #

A participant who discontinues therapy after two weeks is still counted in the ITT analysis.

Practical application #

Preserves randomization benefits and provides a conservative estimate of treatment effect.

Challenges #

Missing data handling; may dilute true efficacy if non‑adherence is high.

Kaplan‑Meier Estimate #

Kaplan‑Meier Estimate

Definition #

A non‑parametric method for estimating the survival function from time‑to‑event data, accounting for censored observations.

Example #

Plotting the probability of remaining event‑free over 24 months for a new drug.

Practical application #

Visual comparison of survival between groups and basis for log‑rank test.

Challenges #

Does not adjust for covariates; limited to descriptive analysis.

Logistic Regression #

Logistic Regression

Definition #

A regression model that predicts the log‑odds of a binary outcome as a linear function of predictor variables.

Example #

Modeling probability of treatment response based on age, gender, and baseline disease severity.

Practical application #

Generates adjusted odds ratios for risk factor analysis and prediction models.

Challenges #

Requires sufficient events per variable; multicollinearity and separation can impede model convergence.

Mean #

Mean

Definition #

The sum of a set of numeric values divided by the number of observations.

Example #

Mean systolic blood pressure of 128 mmHg in a trial cohort.

Practical application #

Central tendency measure for continuous outcomes; used in t‑tests and ANOVA.

Challenges #

Sensitive to outliers; may not represent skewed distributions.

Median #

Median

Definition #

The middle value separating the higher half from the lower half of a data set.

Example #

Median time to progression of 9 months in a cancer study.

Practical application #

Preferred for skewed data or when outliers are present; basis for non‑parametric tests.

Challenges #

Does not convey distribution shape; less efficient than mean when data are normal.

Mixed‑Effects Model #

Mixed‑Effects Model

Definition #

A statistical model that incorporates both fixed effects (population‑level) and random effects (subject‑specific) to handle correlated or clustered data.

Example #

Analyzing repeated blood pressure measurements across multiple clinics, with random intercepts for each clinic.

Practical application #

Allows inclusion of all available data, accommodates missingness under MAR, and models intra‑subject correlation.

Challenges #

Requires correct specification of random‑effects structure; computationally demanding for large datasets.

Null Hypothesis (H₀) #

Null Hypothesis (H₀)

Definition #

A default statement that there is no effect or difference between groups, against which evidence is evaluated.

Example #

H₀: μ₁ = μ₂ (no difference in mean outcome between treatments).

Practical application #

Forms the basis of p‑value computation; rejection leads to claim of statistical significance.

Challenges #

Misinterpretation as proof of no effect; dependence on sample size.

Odds Ratio (OR) #

Odds Ratio (OR)

Definition #

The ratio of odds of an event occurring in the treatment group to the odds in the control group.

Example #

An OR of 2.0 indicates twice the odds of response with the experimental therapy.

Practical application #

Frequently reported in case‑control studies and logistic regression outputs.

Challenges #

Overestimates risk when outcome is common; interpretation less intuitive than risk ratio.

Paired t‑Test #

Paired t‑Test

Definition #

A statistical test that compares the means of two related groups, accounting for the paired nature of observations.

Example #

Comparing baseline and 12‑week cholesterol levels in the same participants.

Practical application #

Increases power by reducing variability due to subject‑specific factors.

Challenges #

Assumes normality of differences; not appropriate for non‑continuous outcomes.

Power #

Power

Definition #

The probability of correctly rejecting the null hypothesis when a true effect exists; commonly set at 80 % or 90 %.

Example #

A study designed with 90 % power to detect a hazard ratio of 0.75.

Practical application #

Drives sample‑size calculations; higher power reduces risk of false‑negative conclusions.

Challenges #

Over‑estimation of effect size leads to under‑powered studies; increasing power inflates cost and recruitment burden.

P‑value #

P‑value

Definition #

The probability of observing data as extreme as, or more extreme than, those observed, assuming the null hypothesis is true.

Example #

A p‑value of 0.03 indicates a 3 % chance of the observed difference arising by random chance.

Practical application #

Determines whether results cross a pre‑specified significance threshold (e.g., α = 0.05).

Challenges #

Does not measure effect size or clinical relevance; susceptible to misuse and p‑hacking.

Randomization #

Randomization

Definition #

The process of assigning participants to treatment arms using a random mechanism to prevent selection bias.

Example #

A computer‑generated permuted block randomization with block size 4.

Practical application #

Balances known and unknown confounders across groups, supporting causal inference.

Challenges #

Implementation errors, lack of allocation concealment, and potential for imbalance in small trials.

Regression Analysis #

Regression Analysis

Definition #

A set of statistical techniques for modeling the relationship between a dependent variable and one or more independent variables.

Example #

Using multiple linear regression to predict change in weight based on diet, exercise, and baseline BMI.

Practical application #

Adjusts for covariates, predicts outcomes, and estimates effect sizes.

Challenges #

Assumptions of linearity, independence, homoscedasticity, and normality must be checked; over‑fitting is a risk.

Sample Size #

Sample Size

Definition #

The number of participants required to achieve a desired power for detecting a pre‑specified effect, given significance level and variability.

Example #

Calculating that 250 patients per arm are needed to detect a 20 % relative risk reduction with 80 % power.

Practical application #

Informs budgeting, timeline, and feasibility assessments.

Challenges #

Inaccurate assumptions about event rates or variance lead to under‑ or over‑powered studies.

Sensitivity #

Sensitivity

Definition #

The proportion of true positives correctly identified by a diagnostic test.

Example #

A biomarker that detects 90 % of patients with disease X.

Practical application #

Critical for evaluating screening tools and case‑finding algorithms.

Challenges #

Trade‑off with specificity; high sensitivity may increase false‑positive rates.

Specificity #

Specificity

Definition #

The proportion of true negatives correctly identified by a diagnostic test.

Example #

A test that correctly classifies 95 % of disease‑free individuals.

Practical application #

Important for confirming disease absence and reducing unnecessary interventions.

Challenges #

Balancing specificity against sensitivity; context‑dependent clinical relevance.

Survival Analysis #

Survival Analysis

Definition #

A collection of statistical methods for analyzing the time until an event of interest occurs, accommodating censored observations.

Example #

Evaluating median overall survival for a new oncology agent.

Practical application #

Enables estimation of survival curves, hazard ratios, and cumulative incidence.

Challenges #

Assumptions about proportional hazards, handling competing risks, and ensuring adequate follow‑up.

Type I Error (α) #

Type I Error (α)

Definition #

The probability of incorrectly rejecting a true null hypothesis; conventionally set at 0.05.

Example #

Concluding a treatment effect when none exists due to random variation.

Practical application #

Determines the threshold for statistical significance.

Challenges #

Multiple testing inflates overall α; controlling family‑wise error may require adjustments (e.g., Bonferroni).

Type II Error (β) #

Type II Error (β)

Definition #

The probability of failing to reject a false null hypothesis; related to study power (1 − β).

Example #

Missing a genuine benefit of a drug because the sample size is too small.

Practical application #

Guides sample‑size planning to achieve acceptable β (often 0.20).

Challenges #

Under‑powered studies increase risk of Type II errors, potentially leading to erroneous conclusions about efficacy.

Unblinded Study #

Unblinded Study

Definition #

A trial in which participants, investigators, or both are aware of the assigned interventions.

Example #

An open‑label extension where all subjects receive the investigational drug after the double‑blind phase.

Practical application #

May be necessary for pragmatic trials or when blinding is infeasible.

Challenges #

Susceptible to performance and detection bias; outcomes may be influenced by knowledge of treatment allocation.

Variance #

Variance

Definition #

A measure of the spread of data points around the mean, calculated as the average squared deviation.

Example #

Variance of systolic blood pressure measurements equal to 225 mmHg².

Practical application #

Essential for sample‑size calculations and for assessing model fit.

Challenges #

Sensitive to outliers; interpretation less intuitive than standard deviation.

Weighted Least Squares (WLS) #

Weighted Least Squares (WLS)

Definition #

A regression technique that assigns weights to observations inversely proportional to their variance, improving efficiency when error variance is unequal.

Example #

Analyzing survey data where larger hospitals contribute more precise estimates than smaller ones.

Practical application #

Corrects for heteroscedasticity and yields unbiased parameter estimates.

Challenges #

Requires accurate variance estimates; misspecified weights can worsen bias.

Yield #

Yield

Definition #

The proportion of screened candidates who become enrolled participants.

Example #

A 30 % yield when 150 out of 500 screened patients consent to join the study.

Practical application #

Assists in forecasting recruitment timelines and budgeting.

Challenges #

Low yield may indicate overly restrictive eligibility or inadequate outreach.

June 2026 intake · open enrolment
from £99 GBP
Enrol