Assessment of Bilinguals: Don’t Go NUTS!

For a long time, many of us have worked on development of better assessment methods for bilingual children. We know that many of the measures normed for monolinguals are not appropriate for bilinguals. We know that translated measures can lose their psychometric properties because difficulty may shift in translation. But, in the last 20 years there have been more measures and procedures that are validated for Spanish-English bilinguals. Work on other language pairs is emerging as well, but right now the majority of available measures focus on Spanish-English.

There are many many procedures and measures (standardized and non-standardized) out there that can help you make diagnostic decisions. Some are well-validated and some, frankly are NUTs (novel unsupported tests). NUTs are well– not what you should be doing. What do I mean?

Critical questions to ask when selecting an assessment procedure are whether it accurately identifies children with DLD (or artic disorders, etc.) and classified those without the impairment as not impaired. . Bottom line, if you are using a procedure be it a standardized test, a checklist, a language sample, a criterion referenced test, observation, interview, an app, cards, and anything else you can think of– IT NEEDS TO DO THE JOB!

The STARD standards (also see here and here) is the go to list for evaluating procedures used in diagnostic decision making. And this should apply to any procedure you use. If your friend tells you about some procedure or gives you a checklist or card set to use, if you are going to propose to spend $500 of your districts money to buy a test, if you’re going to download an app– it doesn’t MATTER. You need to ask some questions!! Here are some:

  • what is the sensitivity and specificity?
    • sens/spec needs to be at least 80% for a diagnostic test, if you don’t know this information– STOP find something else to use
    • (yes, you can use your clinical judgement to SUPPLEMENT what you find, but it shouldn’t be the main thing you base your decision on)
  • what was the gold standard?
    • comparison of the new test is only as good as the old test (or measure). So, ask what the comparison group is and how they were determined to have DLD.
    • caseload vs. non-caseload is a decent start, but you don’t know how kids got onto a caseload to start with. If the measures were poor then you don’t really have anything. The BEST gold standard is state of the art and it’s given to everyone (the DLD and typical group) so that you’re comparing apples to apples.
  • were the testers masked or blinded?
    • remember subjective bias? If you know the diagnosis of a child and you test them on your new test, you may “see” disorder because you are looking for it. The measure you pick shoud have evidence that there were steps to reduce subjective bias.
    • so, if someone comes up with a procedure and they say that they gave it to a bunch of kids with DLD to determine patterns on this procedure– that’s a good start, but it’s only a start. You need to know that someone who doesn’t know they have DLD will score it differently. And you need to know that a kids without DLD will show different patterns.

I know that there are concerns about representation in the norms (but, really if you have items free of bias who is in the norms doesn’t matter as much– as long as they are in the typically developing range–having broad representation of say, dialect, doesn’t fix any problems if there is item bias– it makes it worse– the math is the same as including kids with DLD in the norms (we’ve written about that here).

But, say there isn’t ANY test at all, your kid doesn’t fit the age or whatever–the answer isn’t to give a NUT instead!! FInd something else that meets the criteria above. Dynamic assessment– there are several options here, and validation studies show that there are many DA procedures that work pretty well. They have good classification accuracy, most studies have decent gold standards, and blinding. But, pay attention to the critical elements– modifiability is a better predictor than posttest alone. So, clinical judgement does come into play here but it is systematic and deliberate.

Language samples are also good indicators– some are better indicators at younger ages (MLU) other measures from LS are better for older kids (story grammar). Again, non-standardized, but still systematic, and there’s evidence in the literature for the procedure.

Several of us have published questionnaires that can be used as part of the diagnostic decision making process– here’s one. But, don’t use it because I said so– read up on it. Ask if it does the job. The Inventory to Assess Language Knowledge (ITALK) is part of the BESA. We have a study on this coming out soon. We show that the ITALK is not that great for kindergarten age kids by itelf. But, it does a pretty good job for 2nd grade bilinguals.

So, yes, use a combination of measures, observations, standardized and non-standardized tests to help you make a diagnostic decision. But, please NEVER throw out good diagnostic tools and procedures that are validated and have strong evidence behind them for NUTs.

, , , , , , ,

  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: