Test Development, Item Bias, and the BESA

A couple of people have asked me whether the BESA can work with children in the Eastern US. Yes, I know I’m in Texas and the kids we see here are mainly speakers of Mexican Spanish. But, that doesn’t mean we didn’t collect data from kids in other parts of the country. Or that we didn’t collect data on kids who speak different dialects of Spanish. We collected data mainly in three places: California, Texas, and Pennsylvania. We also had data contributed from other places including Georgia, Utah and New Jersey. What was most important was that we included children who used conservative dialects of Spanish and radical dialects of Spanish. Also, for English speakers, we considered what dialect or variety of English they were learning including: Texas English, California English, African American English, general American English and so on. Why does this matter?

Children who speak different dialects of Spanish (and English for that matter) have differences in the expected phonological patterns of the language. These phonological changes may affect morphosyntax. For example, if the dialect typically deletes final /s/ then it may be difficult to know whether the child is producing  plural -s, as in flower  – flowers; perro – perros. These kinds of items might penalize a child whose phonology follows the pattern of deleting final /s/.

On standardized tests, one solution might be to delete these kinds of items altogether. Yet, it’s important to know if children can produce different forms such as plurals, because they are necessary in communication. Taking out potentially biased items from standardized tests may reduce the accuracy of a test. Yet, ti’s important to develop items that reduce bias so that there are enough items that together will help identify children with language impairment.

Another way to develop good, unbiased items is to examine the context in which these items are produced to see whether some contexts can obligate the form. So, for example, some plurals that end in -es, it may be more easy to know if the child is marking plural, such as flor-flores, even if the child deletes the final /s/ they would produce flore to contrast with flor.

When we developed the BESA we started with a large item pool in each of the domains. In morphosyntax, we started with over a hundred items in each language. This number of items allowed us to test and then throw out the ones that didn’t work, or the ones that showed bias, those that were too easy or too hard. We also tried different ways of scoring. So, for Spanish we accepted “leismo” so that substitutions of le/lo in indirect pronouns were entered as possible alternates. We also made rules for accepting deletion of final -s if the child speaks a radical dialect of Spanish. We completed two iterations of testing and throwing out so that we ended up with the best set of items at the end. After the first iteration we had about 60 items per language and at the end we have about 20-25 per language plus some sentence repetition items (about 10 each with about another 30 targets).

Our analyses at the first round indicated that we did a better job at identifying impairment vs. treatment for 4 and 5 year olds than 6 year olds in Spanish. And, there were minor differences between children who spoke radical dialects of Spanish and those who spoke conservative dialects (see Gutiérrez-Clellen, Restrepo, & Simón-Cereijido). We tightened up the scoring rules and did more testing. We then selected items that demonstrated the greatest differences between children with and without impairment when we pools all groups of Spanish dialects in the sample. This process resulted in only 3 items flagging as having systematic differences between children who spoke different dialects of Spanish. Overall, this represents very few of the targets (less than 5%) and does not affect correct classification.

For English however, we didn’t do as well in the first round of assessment. Gutiérrez-Clellen, & Simón-Cereijido present results of the first round of item development. In this study, the authors report that for the original item set, sensitivity and specificity was fair to good for Latino speakers of English in the Southwest and West, but poor to fair for Latino speakers English in the East. To address this issue, we examined different forms of English and tightened up our scoring rules. We also looked for contexts in which grammatical forms could be obligated more so that children were more likely to produce them regardless of the dialect of English they were learning. Again, we tested more children after throwing out items that demonstrated bias, or did not identify between children with and without language impairment. Final analysis selected the best items with the least bias. Here we ended up with 10 items with potential bias on the morphosyntax subtest. This is less than 20%, but higher than we would like.

For English, when we ran the classification statistics separately for children from the East coast and those from West and Central regions we found that both were generally acceptable (80% correct classification) but that the East coast classification was less accurate by comparison (West and Central regions were about 90% accurate or better).   When we derived a new cutpoint for the East coast region alone, we were able to improve correct classification to 86% sensitivity and 83% specificity. The cutpoints for all these classifications are included in the examiner’s manual of the BESA.

The phonology and semantics subtests did not show these kinds of differences in children’s performance by region. On the phonology subtest we provide guidelines for scoring each item consistent with the expected phonological features of several different dialects. On semantics, it doesn’t really matter what words children use to describe, categorize and so on, we’re just interested in whether they can express semantic relationships with the words they have to demonstrate semantic concepts. Of course morphosyntax does a better job of classifying children with language impairment by itself than does semantics by itself. But, combining the two improves classification overall, and perhaps further reduces bias.

In the end, what will be important as clinicians use the BESA is to use clinical judgement to interpret score patterns on the test. A combination of test data, observational data, and clinical judgement will help make accurate decisions about children’s needs.




, , , , ,

  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: