Do you sum questions of your exams to get final scores of students? Do you use a questionnaire with likert-scales? Do you analyze these questionnaires by taking the means of these questionnaire-items? Do you use the mean of questions in evaluation-forms? Do you average response times to items in experimental settings?
If you have answered yes to any of these questions, you may (or may not but should) have wondered whether the items in your questionnaire, exam, test, or experiment are (sort of) measuring the same thing, the construct you had intended to be measured. If so, it is more than likely that you have calculated Cronbach’s alpha and (if the value was over .7) happily reported that indeed, the items were internally consistent. If so, you have calculated and reported the wrong measure and you are not alone. Despite the fact that methodologists have shown numerous time that Cronbach’s alpha is not suitable for measuring internal consistency (see Sijtsma, 2009, for instance), in handbooks Cronbach’s alpha can still be found as the prime choice measure to be calculated. Because the intention of questionnaire and test constructers is to summarize the test by its overall sum score, Jelle Goeman (and myself) advocate summability, which is defined as the proportion of total “test” (questionnaire-subset, exam, evaluation) variation that is explained by the sum score.
Our paper recently came out in the journal Educational Measurement: Issues and Practice, in which we show summability to be a stable measure across a number of variables (including test or questionnaire length). From the few examples that have been calculated until now, and from insight in the mathematic formula, we can assume that a summability of .5 can be considered “high”. As yet, however, more experience has to be gained on summabilities of tests in various fields before definite recommendations can be given.
Therefore, I end this blog with a “Call for Calculations”: please go to (https://sites.google.com/view/summability) and calculate summability yourself, for an existing test, exam, questionnaire, or experiment. You can download the R-code from the website, or use the link to the shiny-app. All you need is a table with items as columns and participants as rows, filled with participants’ scores on the items, supposedly measuring your (one) construct. The table can be in plain text-format or it can be an SPSS-file. Report your scores through the form available on the website. In this way, we will be able to gain a fast accumulation of knowledge of what constitutes “high,” “moderate,” and “low” summabilities. Thank you!