Test Construction and Item Analysis
Expert-defined terms from the Certificate in Psychological Assessment and Testing course at London School of Planning and Management. Free to read, free to share, paired with a globally recognised certification pathway.
**Ability Test** #
**Ability Test**
A type of psychological assessment that measures an individual's aptitude or pot… #
These tests typically measure cognitive abilities such as verbal, numerical, and spatial reasoning. Related terms include: achievement test, cognitive ability, and intelligence test.
**Achievement Test** #
**Achievement Test**
A type of psychological assessment that measures an individual's current level o… #
These tests are often used to evaluate the effectiveness of educational programs or to diagnose learning disabilities. Related terms include: ability test, competency-based assessment, and performance test.
**Alpha Coefficient** #
**Alpha Coefficient**
A statistical measure used to estimate the reliability of a test or scale #
Also known as Cronbach's alpha, it indicates the degree to which the items on a test consistently measure the same construct. A coefficient of 0.7 or higher is generally considered acceptable for research purposes, while a coefficient of 0.9 or higher is recommended for high-stakes testing situations. Related terms include: reliability, test-retest reliability, and internal consistency.
**Analysis of Variance (ANOVA)** #
**Analysis of Variance (ANOVA)**
A statistical technique used to compare the means of two or more groups on one o… #
ANOVA allows researchers to determine whether any observed differences between groups are statistically significant, and whether those differences can be attributed to specific factors or variables. Related terms include: factorial ANOVA, repeated measures ANOVA, and multivariate ANOVA.
**Classical Test Theory (CTT)** #
**Classical Test Theory (CTT)**
A statistical framework used to develop and evaluate psychological tests and ass… #
CTT assumes that a test score is composed of two components: true score (the individual's actual level of ability or knowledge) and error (random fluctuations or measurement errors). Related terms include: item response theory, factor analysis, and reliability.
**Coefficient Alpha** #
**Coefficient Alpha**
See Alpha Coefficient #
See Alpha Coefficient.
**Content Validity** #
**Content Validity**
A type of validity that assesses whether a test or assessment measures what it i… #
Content validity is typically established through a careful review of the test items by subject matter experts, who evaluate whether the items accurately reflect the construct or domain being measured. Related terms include: face validity, construct validity, and criterion-related validity.
**Construct Validity** #
**Construct Validity**
A type of validity that assesses whether a test or assessment accurately measure… #
Construct validity is typically established through a combination of theoretical and empirical evidence, such as factor analysis, correlational studies, and experimental research. Related terms include: content validity, face validity, and criterion-related validity.
**Criterion #
Related Validity**
A type of validity that assesses the relationship between a test or assessment a… #
Criterion-related validity can be established through concurrent validity (comparing test scores to an existing criterion) or predictive validity (using test scores to predict future performance). Related terms include: content validity, construct validity, and face validity.
**Cut Score** #
**Cut Score**
A score or point on a psychological test or assessment that serves as a boundary… #
Cut scores are often used to make decisions about whether an individual meets a specific standard or qualification, such as passing a licensure exam or being diagnosed with a particular disorder. Related terms include: passing score, standard setting, and decision point.
**Discriminant Validity** #
**Discriminant Validity**
A type of validity that assesses whether a test or assessment measures what it i… #
Discriminant validity is typically established by demonstrating that a test or assessment is uncorrelated with measures of unrelated constructs or variables. Related terms include: convergent validity, divergent validity, and validity.
**Empirical Key** #
**Empirical Key**
A method of item analysis that involves creating a "key" or set of correct answe… #
Items that are answered correctly by a high proportion of the sample are considered to be easy, while items that are answered correctly by a low proportion of the sample are considered to be difficult. Related terms include: item difficulty, item discrimination, and item analysis.
**Face Validity** #
**Face Validity**
A type of validity that assesses whether a test or assessment appears to measure… #
Face validity is typically established through a subjective evaluation of the test items by individuals who are familiar with the construct or domain being measured. Related terms include: content validity, construct validity, and criterion-related validity.
**Factor Analysis** #
**Factor Analysis**
A statistical technique used to identify underlying patterns or dimensions of va… #
Factor analysis is often used to establish construct validity, by demonstrating that the items on a test consistently load onto a smaller number of factors or dimensions. Related terms include: exploratory factor analysis, confirmatory factor analysis, and principal components analysis.
**Item Analysis** #
**Item Analysis**
A process of evaluating the performance of individual items on a psychological t… #
Item analysis typically involves examining measures such as item difficulty, item discrimination, and item distractors, in order to identify items that are redundant, confusing, or poorly written. Related terms include: empirical key, item response theory, and classical test theory.
**Item Difficulty** #
**Item Difficulty**
A measure of how difficult or easy an item on a psychological test or assessment… #
Item difficulty is typically expressed as a proportion or percentage of individuals who answered the item correctly. Related terms include: item discrimination, item analysis, and item response theory.
**Item Discrimination** #
**Item Discrimination**
A measure of how well an item on a psychological test or assessment distinguishe… #
Item discrimination is typically expressed as a correlation coefficient or point-biserial correlation. Related terms include: item difficulty, item analysis, and item response theory.
**Item Response Theory (IRT)** #
**Item Response Theory (IRT)**
A statistical framework used to model the relationship between an individual's p… #
IRT uses item parameters such as difficulty and discrimination to estimate an individual's ability or proficiency level, and can be used to develop more precise and efficient tests and assessments. Related terms include: classical test theory, item analysis, and factor analysis.
**Kuder #
Richardson Formula 20 (KR-20)**
A statistical formula used to estimate the reliability of a binary (dichotomous)… #
KR-20 is a type of coefficient alpha that is specifically designed for binary data, and is often used to evaluate the internal consistency of tests that have only right or wrong answers. Related terms include: reliability, alpha coefficient, and test-retest reliability.
**Norm #
Referenced Testing**
A type of psychological testing that compares an individual's performance to a n… #
Norm-referenced testing is often used to establish percentile ranks or standard scores, which can be used to compare an individual's performance to a national or regional average. Related terms include: criterion-referenced testing, norm group, and standardization.
**Norm Group** #
**Norm Group**
A group of individuals who have taken a psychological test or assessment and who… #
Norm groups are typically selected to be representative of a larger population, and may be stratified by factors such as age, gender, or education level. Related terms include: norm-referenced testing, standardization, and percentile rank.
**Percentile Rank** #
**Percentile Rank**
A score or statistic that indicates an individual's performance relative to a no… #
Percentile ranks are expressed as a percentage, with higher percentile ranks indicating better performance. For example, a percentile rank of 80 indicates that an individual scored better than 80% of the norm group. Related terms include: norm-referenced testing, standard score, and z-score.
**Reliability** #
**Reliability**
The degree to which a psychological test or assessment produces consistent and s… #
Reliability is typically expressed as a correlation coefficient or reliability index, with higher values indicating greater consistency and stability. Related terms include: test-retest reliability, internal consistency, and inter-rater reliability.
**Standard Deviation** #
**Standard Deviation**
A measure of variability or dispersion in a set of scores or data #
The standard deviation is calculated as the square root of the variance, and represents the average distance between each score and the mean of the distribution. A larger standard deviation indicates greater variability or dispersion, while a smaller standard deviation indicates greater consistency or homogeneity. Related terms include: variance, mean, and standard error.
**Standard Error** #
**Standard Error**
A measure of the variability or uncertainty associated with a sample mean or est… #
The standard error is calculated as the standard deviation of the sampling distribution, and represents