Technical Adequacy of DIBELS Deep CFOL

The technical adequacy (e.g., reliability and validity) of DIBELS Deep CFOL has been investigated in a series of studies from 2009-10 to the present. The following is a brief summary of results from the most recent study, conducted during the 2013-2014 academic year and including 474 students across four states representing three of the four major census regions of the United States. Children with disabilities and children who were English language learners were included in the research provided they had the response capabilities to participate.

What follows is a brief summary the technical data for DIBELS Deep CFOL.


Inter-rater reliability of the CFOL ranges from .76 (strong) to .99 (almost perfect). Of the 21 tasks examined, 18 have inter-rater reliability of .90 or higher.

The proportional agreement between raters (Cohen's Kappa) ranges from .41 (moderate) to .93 (almost perfect).

Internal consistency reliability (Chronbach's Alpha) ranges from .44 to .90. Out of the 21 tasks, three demonstrate acceptable internal consistency (above .60), 13 demonstrate good internal consistency (above .70), and the fluency passages demonstrate excellent internal consistency (above .90). Overall, these reliability estimates suggest that the items within the CFOL assessment possess good internal consistency reliability.

Internal consistency reliability (communality estimates from the factor analysis) ranges from .61 to .98 across grade levels for the four major skill areas assessed (Comprehension, Fluency, Oral Language, Vocabulary/Word Knowledge).


Criterion-related validity: Where significant, correlations with DIBELS Next scores range from .20 (small) to .79 (strong), depending on measure and grade.

Construct validity - Correlations between CFOL Sections: Where significant, correlations among tasks within CFOL sections ranged from .21 (small) to .92 (strong), depending on the task and section.

Construct validity - Item-response Analyses: Items were evaluated through various statistics: raw score, level of response, difficulty parameters, discrimination parameters, information, the area under the information curve, average ability level, and Cronbach’s Alpha. Not every metric was used to evaluate each section. Results from the item response analysis indicated that, as designed, items progressed in difficulty by the order in which they were placed in each section. Additionally, higher grade levels produced larger mean scores than lower grade levels, indicating that the measure adequately assessed a progression of skill level.

Construct validity - Confirmatory Factor analysis: Student data from all grades were combined in a single model in order to capture the most variability at the task-level. Tasks were separated into categories based on the skills they measure (comprehension, fluency, and oral language skills, including vocabulary), and the categories were connected by a single latent construct. To assess model fit, the Akaike information criterion (AIC) from the confirmatory model was compared to the AIC from two comparison models: 1) a baseline model in which tasks were grouped together under their section categories, and 2) an alternate model with a different mix of tasks under the comprehension and fluency categories. The confirmatory model had the lowest AIC, indicating that it was the best model. The difference in AIC between the confirmatory model and the comparison models was 48.11 for comparison model 1 and 96.83 for comparison model 2, both of which are well above the threshold of ten for significantly different model fit (Burnham & Anderson, 2002).

More Information

For further details regarding technical adequacy of the DIBELS Deep CFOL, please send an
email to