A COMPARISON OF STATISTICAL AND JUDGMENTAL METHODS FOR IDENTIFYING ITEM BIAS (LATENT TRAIT, TEST CONSTRUCTION).

Item

Title
A COMPARISON OF STATISTICAL AND JUDGMENTAL METHODS FOR IDENTIFYING ITEM BIAS (LATENT TRAIT, TEST CONSTRUCTION).
Identifier
AAI8501173
identifier
8501173
Creator
SCHOENER, JOHN EDWIN.
Contributor
Alan L. Gross
Date
1984
Language
English
Publisher
City University of New York.
Subject
Education, Educational Psychology
Abstract
The purpose of this study was to compare test items identified as biased using statistical and judgmental procedures. Several critical questions were investigated: Will test items identified as biased by reviewers from different subgroups of the population be related across subgroups? Will test items identified as biased by a statistical procedure for ethnic subgroups of students be related to those identified as biased for gender subgroups of students? Will statistical and judgmental methods for identifying biased items on a test agree? The affect of rescoring the test eliminating items identified as biased was examined and the correlations of the test and the rescored test with an external criterion were determined.;A criterion-referenced mathematics test was administered to 1064 high school students of both sexes and diverse ethnic backgrounds. The three-parameter latent trait model was the statistical procedure used to detect biased items. The judgmental procedure consisted of a review of the test items by twenty-four judges who were knowledgeable about high school mathematics curricula. Judges used a structured rating form. Eight of the reviewers were black, eight of the reviewers were white, and eight of the reviewers were Hispanic. Within each group, half of the reviewers were male and half were female.;Agreement between statistical ratings among subgroups, agreement between judgmental ratings among subgroups and agreement between the statistical and judgmental procedures was assessed using the Kappa statistic. There was no significant agreement between statistical bias ratings for ethnic and gender subgroups. There was significant agreement on some of the indicators calculated to combine judges ratings, but not on others. There was no significant agreement between item bias detection methods. Rescoring the test, eliminating the items identified as biased by the statistical procedure, did not change the rank order of subgroups and the total group. Both the test and the rescored test had significant correlations with a standardized, norm-referenced mathematics test.
Type
dissertation
Source
PQT Legacy CUNY.xlsx
degree
Ph.D.
Program
Educational Psychology
Item sets
CUNY Legacy ETDs