Quesen, S., & Lane, S. (2019). Differential item functioning for accommodated students with disabilities: Effect of differences in proficiency distributions . Applied Measurement in Education , 32 (4), 337–349. https://doi.org/10.1080/08957347.2019.1660347

Journal Article

Quesen, S., & Lane, S. (2019). Differential item functioning for accommodated students with disabilities: Effect of differences in proficiency distributions. Applied Measurement in Education, 32(4), 337–349. https://doi.org/10.1080/08957347.2019.1660347

Tags

Breaks during testing; Disabilities Not Specified; Extended time; K-12; Math; Oral delivery; Oral delivery, live/in-person; Small group; U.S. context

URL

https://www.tandfonline.com/loi/hame20

Summary

Accommodation

The study purpose was to explore four models of uniform differential item functioning (DIF) detection for the possibility of different detection rates, (a) based on different ability distributions, and (b) related to subgroups of students using different common accommodations: extended time along with other accommodation/s, oral delivery live in-person, and frequent test breaks.

Participants

An extant data set of assessment scores from grade 8 students (N=135,305) from an unidentified state (U.S.) was analyzed. In that population, there was a total of 12,795 students with unspecified disabilities using accommodations: the focal group. The reference group of students without disabilities (n=108,019) was sampled twice: (a) a reference sample of 3,000 students that "reflected the observed ability differences between groups" (p. 340) and (b) a reference sample of 3,000 students with a set of scores that was equivalent to the scores from the focal group.

Dependent Variable

Sixty (60) multiple choice items from the grade 8 mathematics statewide assessment were used to investigate differential item functioning (DIF) between participant groups. Four different approaches were explored—hierarchical generalized linear model (HGLM), item response theory (IRT) using the Wald test, logistic regression (LR), and the Mantel-Haenszel (MH) procedure—for modeling the statistics and examining DIF, in order to detect any substantive differences in results and conclusions.

Findings

As a matter of further developing comparisons to examine differential item analyses approaches, the researchers reported performance scores. Students without disabilities (the entire reference group) had a mean score of 45.5 (SD=12.1). Students with disabilities using accommodations (the entire focal group) had a mean score of 28.8 (SD=12.7). Students with disabilities using only extended time had a mean score of 46.7 (SD=12.0). Students with disabilities using extended time and any other accommodation/s had a mean score of 27.8 (SD=13.3). Students with disabilities using 'all items read aloud' had a mean score of 27.2 (SD=11.7). Students with disabilities using only 'some items read aloud' had a mean score of 27.8 (SD=12.4). Students with disabilities using only frequent breaks had a mean score of 25.1 (SD=12.1). In addressing the guiding research questions, the researchers reported a complex set of findings. Regardless of the four differential item functioning (DIF) analysis methods—logistic regression (LR); hierarchical generalized linear model (HGLM), Wald-1 IRT-based test; and Mantel-Haenszel (MH)—applied to the 60 multiple-choice items from a statewide grade 8 math test data set, no DIF was detected when the reference group (non-accommodated students without disabilities) had a similar performance distribution to the focal group (accommodated students with disabilities). Each of the methods detected DIF in 5–11 items (excluding the 12 anchor items) when the reference group was not similar in ability to the focal group. Two methods, LR and MH, had items all favoring the reference group; the other two methods, Wald-1 and HGLM, had items all or nearly all favoring the focal group. [Note: the latter result, of items favoring the focal group, could suggest that accommodations supported a differential boost for students with disabilities when they did not actually do so.] The items flagged for DIF were somewhat different when examining data for the various accommodated groups. The authors concluded that creating the reference group to be similar in performance to the focal group can help avoid erroneous DIF detection for students with disabilities.