A COMPARISON OF RELIABILITY ESTIMATES FROM SINGLE AND DOUBLE ADMINISTRATIONS OF CRITERION-REFERENCED TESTS.

Item

Title
A COMPARISON OF RELIABILITY ESTIMATES FROM SINGLE AND DOUBLE ADMINISTRATIONS OF CRITERION-REFERENCED TESTS.
Identifier
AAI8319794
identifier
8319794
Creator
SCHAEFER, MARY MILLER.
Contributor
Alan Gross
Date
1983
Language
English
Publisher
City University of New York.
Subject
Education, Educational Psychology
Abstract
The purpose of this study was to compare three models for determining the reliability of criterion-referenced tests. These models, coefficient kappa (k), Huynh's estimate of k ((')k), and Subkoviak's coefficient of agreement (p(,cs)), were used to examine data from 325 students tested on two occasions with identical items. The effect of five student and test characteristics (test length, cut-off score, student ability, sample size and heterogeneous test content) on the resulting reliability coefficient were determined. All possible combinations of test items for each test length were examined across all analyses. Coefficient k, considered the standard, required data from two test administrations. The other models ((')k and p(,cs)) were developed for use when only data from a single test administration are available. These criterion-referenced reliability coefficients were also compared to norm-referenced coefficients (Kuder-Richardson and test-retest).;Representative mean values ((+OR-) SEM) obtained from the test length analysis for k, (')k and p(,cs) (compound binomial) were .402 (+OR-) .085, .588 (+OR-) .045, and .921 (+OR-) .033, respectively. Similar values were obtained for other analyses. The estimate of k, (')k, modestly overestimated k under all conditions except where test items were heterogeneous. Values obtained for the coefficient of agreement p(,cs) were consistently much larger than k, possibly due to the fact that p(,cs) is not corrected for chance agreement. No consistent relationships between criterion-referenced and norm-referenced coefficients were observed.;The data indicate that when estimating reliability of criterion-referenced tests, (')k, in contrast to p(,cs), serves as a reasonable estimate of reliability as determined by the standard, k.
Type
dissertation
Source
PQT Legacy CUNY.xlsx
degree
Ph.D.
Program
Education
Item sets
CUNY Legacy ETDs