I belong to a Facebook group SLPs for Evidence Based Practice. There is frequent discussion of what works and what doesn’t in intervention and in assessment. My work has often focused on assessment and assessment practices. And, I have to say that it is frustrating to find that something does (or doesn’t) work but that clinical practices take so long to change. So, I wonder what is our obligation in the field to be aware of the evidence? And what is our obligation to make changes in our practices?
What started me thinking about this is a post in the group about a paper published in LSHSS in 2013. I really like this paper, and when I was at UT, the MA students read it in my measurement class. Now, at UCI, even though none of the Ph.D. students (yet) are from CSD, those in my measurement class also read it. It’s a good illustration of how practitioners in the field (and I suspect in many health-related and education-related fields) make day to day decisions about what measures to use.
The researchers in this paper asked SLPs how often they used standardized tests for identification of language impairment. The top 10 most frequently used standardized tests were the CELF-4; Preschool Language Scale, Fourth Edition, English Edition (PLS-4; Zimmerman, Steiner, & Pond, 2002); PPVT-4; EOWPVT-3; Comprehensive Assessment of Spoken Language (CASL; Carrow-Woolfolk, 1999); CELF—Preschool—2 (CELF-P2; Semel, Wiig, & Secord, 2004); Receptive One-Word Picture Vocabulary Test (ROWPVT-2; Brownell, 2000); TOLDP:4; Oral and Written Language Scale (OWLS; Carrow-Woolfolk, 1995); and Expressive Vocabulary Test, Second Edition (EVT-2; Wiliams, 2007).
What’s really interesting is that of these only 2 (CELF-4 and CELF-P2) report sensitivity and specificity above 80%. And another 2 report sensitivity and specificity less than 80%, and THE REST DON’T REPORT THIS INFORMATION. So, in theory anyway, for 6 of these 10 most frequently used tests we don’t know if they actually do the job that we are using them for.
This is a problem. We may like these measures, we may be familiar with the measures, and even think that they provide some good information. They probably do, but we have no idea to what extent they accurately classify children with and without language impairment. Why do we accept this? I don’t think we would do so for other things. Would you agree to a diagnostic medical test if its accuracy for identifying the disease was unknown? Would you buy a house where they didn’t measure the square footage? STOP IT!
And what about the tests that were almost never used that DO have good sensitivity and specificity. These included the SPELT-3; TOLD-I-3, TNL, WBC, TEEM, DIAL, CCC-2, DELV-N, TWF-D, PEST, and TEGI. Why don’t we use these?
One finding was that frequency of use was statistically related to year of publication so that more recent tests were more likely to be used. And sure, I can see that it is important to use more recent versions of tests, but they have to work!
So, my question is how do we move toward using tests that work? I know that sometimes tests aren’t available or there isn’t a budget for them, etc. So, how do we influence those who are in charge of the $$ that it’s important to have tests that have good classification accuracy? How do we help our peers to maybe move out of what they are comfortable with or what they feel confident in to try something new? I know that learning a new measure is sometime difficult to do, it’s complex and maybe it’s not something we were trained to do, but we do need to do better. Our credibility is at stake.