Multiple choice (MC) testing is generally viewed as an efficient way of assessing student knowledge. Up until today it has been a very popular assessment format in higher education. Especially in courses where a large number of students are enrolled, the MC format is often used as it allows for fast and reliable grading. Not surprisingly, as an educational consultant and teacher trainer much of my work has revolved around supporting teachers in creating and/or improving their MC assessments. Throughout the years, I have taught numerous training sessions on Testing and Assessment for the University Teacher Qualification at Leiden University. On the one hand these training sessions are designed to teach best practices, but at the same time the sessions are also designed to cater to teacher needs. As such, a large part of the sessions is focused on giving teachers instructions and tips on how to create good MC questions. To be sure, I have always managed to squeeze in some discussion on the downsides and many limitations of MC testing as well. But still… It always kept me feeling a bit uneasy. In giving the instructions that the program compelled me to, I might have inadvertently been endorsing this practice more than I would have wanted. Thus, this blogpost will be as much repentance as it is a cautionary exposition about some of the negative consequences that MC testing can have on student learning.
There are multiple reasons for why MC exams could be considered as detrimental for student learning. For instance, one often heard criticism is that the recognition-based MC exam format will often result in students preparing for exams in a superficial way. Furthermore, one could argue that the ecological validity of MC exams is low and not representative of real-world situations. Also, the MC test format is by design not suitable for assessing higher levels of learning. These kind of objections are well-known and they have also received considerable attention in the University teacher Qualification courses on testing and assessment taught at Leiden University. I am not going to reiterate them extensively in this blogpost. Instead, I will discuss one particularly negative consequence of MC testing that I think is often neglected: the misinformation effect.
The misinformation effect
Before we consider the misinformation effect in the context of MC testing, we will first take a step back and consider some general research on the workings of human memory and how misinformation can result in misremembering. One of the first general demonstrations of the misinformation effect was provided by Loftus & Palmer (1974). In Experiment 2 of their seminal study, participants watched a short video clip of a car accident. After watching the video, participants were asked to give an estimate of the speed of the cars that were involved in the accident. Half of the participants were asked to estimate the speed of the cars when they smashed into each other, while the other half of participants estimated the speed for when the cars hit each other. The subtle change of the verb used in the question resulted in a difference in the reported speed: Participants estimated the speed to have been higher when they were in the smashed condition. More importantly, one week after giving the speed estimates, participants returned and were asked to indicate whether they remembered seeing broken glass in the video. Interestingly, participants in the smashed condition were much more likely to report having seen broken glass even though there was none to be seen in the video.
The results from the Loftus & Palmer study are often cited in the context of the reliability of eye-witness testimonies (and the effects that leading questions can have on misremembering). More importantly, the results are also taken as evidence in support for the idea that human memory is reconstructive in nature. During the retrieval of information from memory we reconstruct what we have previously experienced. When previously exposed to some form of misinformation, the process of reconstruction can result in substantial misremembering of previous experiences.
The misinformation effect in the context of MC questions
In the Loftus & Palmer (1974) study, the degree to which participants were exposed to misinformation was rather subtle (i.e., a small change of verb in the leading question). However, if we now consider the situation of an MC exam, the degree of exposure to misinformation seems much more extreme. A typical MC question will often have four alternatives for students to choose from of which the majority (usually three) is incorrect. Thus, by using MC exams, we are intentionally exposing our students to misinformation. MC exams are designed to do just that. Surely, you could argue that the negative consequences of MC exams might be less severe, because students are aware that they are being exposed to misinformation. They are going into the exam expecting this. However, in preparation of the exam, the teacher has also taken careful consideration of phrasing erroneous answers in such a way that they are plausible. Teachers are instructed to formulate alternatives that students are likely to mistakenly select as the correct one. By exposing students to misinformation in the context of MC exams, teachers might very well be sacrificing student learning for the sake of fast and reliable grading.
In a later study by Roediger & Marsh (2005) the consequences of MC testing on student learning was investigated. In their experiment, participants studied short prose passages (or not) and were subsequently tested on the materials (or not) using MC questions with a number of alternatives ranging from 2 – 6. One week later participants returned and received an open-ended short answer test. Going into the test, participants were also given explicit instructions not to guess. First of all, the results on the 1-week test showed that the consequences of MC testing were not all bad: Taking a MC test increased the retention of (correct) information. This finding, also referred to as the testing effect, is well-established in the literature and has often been replicated across different test formats and settings (e.g., Rowland, 2014). On the other hand, however, being exposed to misinformation in the MC test, also increased the production of erroneous answers on the 1-week short answer test. The degree to which participants produced erroneous (MC) answers tended to increase as the number of alternatives of the MC test increased. Note that this was the case even though participants had received explicit instructions not to guess on the short answer test. Clearly, the misinformation effect is not just relevant in the context of eye witness testimonies, but also in the context of assessment in higher education. MC exams can have an adverse effect on student learning in the sense that students can mistakenly recall incorrect answer options at a later point in time. Later research (Butler & Roediger, 2008) has shown that the misinformation effect as a result of MC testing can be reduced by giving students direct feedback (either after each individual question or after taking an entire test). However, in my experience, summative MC exams in higher education usually don’t provide immediate feedback to students. In the absence of corrective feedback, students might stay under the impression that their erroneous responses on a test were correct.
To end on a positive note, there are promising alternatives for MC exams that teachers are exploring. For instance, at the Leiden University Medical Centre (LUMC) some teachers have started using Very Short Answer Questions (VSAQs) on exams as a substitute for MC questions. Among others, dr. Alexandra Langers (Leiden University Teaching Fellow), and her PhD student Elise van Wijk have started investigating the consequences of VSAQ exam format. VSAQs require students to generate short (1 – 4 word) answers to exam questions. Compared to MC questions, VSAQs require retrieval of correct answers rather than simple recognition and as such these type of questions can be more conducive for student learning. Because answers are short, VSAQs will still allow for some degree of automatic scoring (for some predetermined “correct” responses). This can keep grading time acceptable even for teachers with large classes. Some of the findings of the VSAQ research project have recently been published in an article in PLOS ONE. Replicating previous findings (Sam et al., 2018), van Wijk et al., (2023) demonstrate that VSAQ exams can have added benefits over MC tests in terms of higher reliability and discriminability. In addition, van Wijk at al. found that the average grading time per individual VSAQ was around two minutes. This seems very acceptable considering the cohort in the study consisted of more than 300 students. Hopefully, initiatives like the one at LUMC will pave the way for other teachers to start using assessment types that can be more supportive of student learning.
Butler, A. C., Roediger, H. L. (2008). Feedback enhances the positive effects and reduces the negative effects of multiple-choice testing. Memory & Cognition, 36, 604–616. https://doi.org/10.3758/MC.36.3.604
Loftus, E. F., & Palmer, J. C. (1974). Reconstruction of automobile destruction: An example of the interaction between language and memory. Journal of Verbal Learning & Verbal Behavior, 13, 585–589. https://doi.org/10.1016/S0022-5371(74)80011-3
Roediger, H. L., & Marsh, E. J. (2005). The positive and negative consequences of multiple-choice testing. Journal of experimental psychology. Learning, memory, and cognition, 31, 1155–1159. https://doi.org/10.1037/0278-7322.214.171.1245
Rowland C. A. (2014). The effect of testing versus restudy on retention: a meta-analytic review of the testing effect. Psychological bulletin, 140, 1432–1463. https://doi.org/10.1037/a0037559
Sam, A. H., Field, S. M., Collares, C .F., van der Vleuten, C. P. M., Wass, V. J., Melville, C., Harris, J., & Meeran, K. (2018), Very-short-answer questions: reliability, discrimination and acceptability. Med Educ, 52, 447-455. https://doi.org/10.1111/medu.13504
van Wijk, E. V., Janse, R. J., Ruijter, B. N., Rohling J. H. T., van der Kraan J., Crobach, S., de Jonge, M., de Beaufort, A. J., Dekker, F. W., Langers, A. M. J. (2023). Use of very short answer questions compared to multiple choice questions in undergraduate medical students: An external validation study. PLOS ONE, 18, e0288558. https://doi.org/10.1371/journal.pone.0288558