Educational programmes across higher education incorporate knowledge from multiple disciplines. This can be multidisciplinary, interdisciplinary, or transdisciplinary education. In multidisciplinary education the disciplines are juxtaposed, whereas in interdisciplinary education knowledge from the different disciplines is integrated to create new solutions to problems that could not be solved by separate disciplines. Conversely, in transdisciplinary education, the boundaries between disciplines are transcended, and often stakeholders from outside academia are involved as well (Ashby & Exter, 2019; Klein, 2017).
According to van Goch (2023) interdisciplinary education should meet three conditions: there’s a complex problem, multiple disciplines provide insights into this problem, and the different disciplinary insights are integrated. In educational settings, students usually integrate these insights themselves, but teachers can also provide knowledge integration. Knowledge integration strategies should be an explicit part of instruction during interdisciplinary education, since students will not acquire integrative skills merely through learning by doing (van Goch, 2023; van Lambalgen & van der Tuin, 2024).
Interdisciplinary education can help students develop skills needed for their future careers (van den Beemt et al., 2020). However, knowledge about the design and outcomes of interdisciplinary education is often only shared locally (Lindvig & Ulriksen, 2019). This post presents a shot overview of current knowledge about the design, learning outcomes and assessment of interdisciplinary education. The full literature overview it is based on (in Dutch) is available upon request.
One of the best things about being a researcher is the opportunities it provides to learn from and with colleagues from different countries and contexts. This has certainly been my experience of late, as a member of the COST action, ‘Content and Language Integrated Learning Network for Languages in Education (CLIL NetLE).
Funnily enough, the CLIL NetLE has also sparked new local connections. During the CLIL NetLE’s international Teacher Training School in Leiden, Fred Janssen (Professor of Science Education and ICLON’s Scientific Director) and I explored a question that has been bubbling under the surface for some time: Are his perspective-oriented approach and my main research and teaching interest – disciplinary literacies – actually two sides of the same coin? Or, as our former colleague, Evelyn van Kampen, once put it, are we “digging the same tunnel”?
Subject-specific thinking and knowing
Perspectives and disciplinary literacies have one crucial element in common: the conviction that teaching and learning a subject involves subject-specific thinking and knowing. In other words, learning biology means learning to think like a biologist, and to understand the beliefs and assumptions that influence that discipline.
Perspective-based education is a way to help students get a grip on complex problems by posing questions through different ‘lenses’ (perspectives). The perspectives also provide direction in searching for and testing answers. Take the nitrogen crisis, for example. To grasp this topic, you must approach it from at least biological, chemical, political-administrative and economic perspectives.
The underlying principle of disciplinary literacies is that subject-specific thinking and knowing are linked to subject-specific communication. Due to their different frames of reference a biologist and a historian will each communicate knowledge in distinct ways. Communication in each field will differ again depending on purpose, audience and context. Thus, subject-specific communication is integral to teaching and learning any subject, in any language.
Missing Cs
So what is it that keeps these two tunnels from meeting? It can help to refer a well-established model from content and language integrated learning (CLIL). According to the 4Cs (Coyle et al., 2010), subject teaching involves four key elements, which are inextricably linked with each other: Content, Communication, Cognition and Culture. Through the lens of the 4Cs, it becomes clear why the perspective-based and disciplinary literacies ‘tunnels’ have never quite met. While their core is the same, each of them is missing explicit attention for a ‘C’.
While disciplinary literacies focuses heavily on the ways Communication, Cognition and Culture interact while learning or teaching subject content, there is little explicit attention in this field to which subject-specific Content should be learned and taught.
In the perspective-based approach, choice of Content is the central issue. It relates closely to the Cognitive processes, beliefs and types of behaviour (Culture) associated with the subject. What is missing is attention for the impact these have on how we Communicate in that discipline.
Out of the tunnel and into the tree
We believe this exploration is a valuable step in bringing our respective ‘tunnels’ closer together and in opening them up further so that subject specialists can relate to and translate them into practice.
But perhaps we should get out of our tunnels altogether and breathe some fresh air. CLIL NetLE has published an initial operationalisation of bi- and multilingual disciplinary literacies. There, the concept is broken down into functional, critical, technological, multi- and transsemiotic, and bi-, multi- and translingual dimensions, depicted through the metaphor of a tree. Might there also be room in that tree for a ‘disciplinary perspectives’ dimension? This is a question we hope to explore further, both locally and with our international partners.
Sparked your interest?
You can find out more about the CLIL NetLE and access its outputs so far, see the CLIL NetLE website. If you are a researcher in CLIL, disciplinary literacies, subject didactics or a related area, and would like to get involved in CLIL NetLE, you can find out how here.
You can find more information about ICLON’s work with the perspective-based approach, including professional development and workshops for schools, here (in Dutch – fill in the contact form if you would like to discuss possibilities in English).
For teachers and schools looking to delve further into CLIL, disciplinary literacies or the perspectives, let us know! ICLON’s professional development department can discuss the possibilities for bespoke programmes or guest lectures.
Another tunnel we may want to access in this discussion is the Pluriliteracies movement, spearheaded by Do Coyle and Oliver Meyer.
Photography: Stefanie Uit Den Boogaard
Tree figure adapted from the original by Talip Gülle
Traditionally, the SET is one of the most widely employed instruments in higher education for evaluating the quality of teaching (e.g., Hendry & Dean, 2002; Hounsell, 2003). For a typical SET, after taking a course, students are asked to rate various aspects of the course (e.g., the clarity of the objectives, the usefulness of the materials, the methods of assessment) on a Likert scale. SET data is often the first and foremost source of information that individual teachers can use to evaluate both existing and new innovative teaching practices. SETs are often integrated in higher education professional development activities. For instance, at some faculties at Leiden university at Leiden University, SET results and their interpretation is an integral part of the University Teacher Qualification (a.k.a., BKO). Starting out teachers are expected to critically reflect in teaching portfolios on the results of SETs for the classes they have taught. Furthermore, the results of SETs can function as a source of information for teachers’ supervisors to guide discussions in yearly Performance and Development Interviews, sometimes leading to recommended or enforced future professional development activities for teachers.
However, for some time now, the SET has been subject to scrutiny for a variety of reasons. First, based on an up-to-date meta-analysis, the validity of SETs seems questionable. That is, there appears to be no apparent correlation between SET scores and student learning performance at the end of a course (Uttl, 2017). In fact, when learning performance is operationalized as the added value of a teacher to the later performance of students during subsequent courses (Kornell & Hausman, 2016), the relationship can even be inversed (i.e., teachers with lower SET scores appear to be of more added value). One explanation for this finding is that making a course more difficult and challenging can result in lower SET scores, presenting teachers with a perverse incentive to lower the bar for their students to obtain higher scores on a SET (Stroebe, 2020).
Second, the intensive and frequent use of SETs can lead to a form of “evaluation fatigue” among students (Hounsell, 2003), sometimes resulting in mindless and unreliable evaluations of teaching (e.g., Reynolds, 1977; Uijtdehaage & O’Neal, 2015). As a case in point, a classic article by Reynolds (1977) reported how a vast majority of students in a medical course chose to evaluate a lecture that had been cancelled, as well as a video that was no longer part of the course. In a rather ironic reflection on these results Reynolds concluded that:
“As students become sufficiently skilled in evaluating films and lectures without being there,… …, then there would be no need to wait until the end of the semester to fill out evaluations. They could be completed during the first week of class while the students are still fresh and alert.”
Third, the results of student evaluations of teaching can be severely biased (e.g., Neath, 1996; Heffernan, 2022). For instance, in a somewhat tongue-in-cheek review of the literature, Neath (1996) listed 20 tips for teachers to improve their evaluations without having to improve their actual teaching. The first tip on the list: Be male. Apparently, research suggests that, in general, male teachers receive higher ratings on SETs compared to female teachers. In a more recent review of the literature, Heffernan (2022) goes on to argue that SETs can be subject to racist, sexist and homophobic prejudices, and biased against discipline and subject area. Also, SETs that also allow for a qualitative response can sometimes illicit abusive comments most often directed towards women and teachers from marginalized groups. As such, SETs can be a cause of stress and anxiety for teachers rather than being an actual aid to their development.
Fourth, although studies often emphasize the importance of SETs for supporting and improving the quality of education, the underlying mechanism remains elusive (Harrison et al., 2022). It is unclear how SETs contribute to improving the quality of teaching. To the contrary, teachers can often find it challenging to decide what actual changes to make based on aggregated SET data that is largely quantitative in nature (Henry & Dean, 2010).
In short, the continued use of SETs for evaluating the quality of teaching in higher education is difficult to justify. The findings reported in the literature indicate that the validity and reliability of the SET are questionable, and the value for educational practice appears to be limited. One could argue that sticking with the SET is more a tradition than it is evidence-informed practice. Perhaps universities mostly persist in the routine in lack of an equally (cost-)efficient and scalable alternative. In this blog, we delineate the development and pilot of one possible alternative.
The FET. Late 2023, an Innovation Fund Proposal was awarded a small grant to develop an alternative approach for the evaluation of teaching. At the start of 2024, Mario de Jonge (researcher at ICLON), Boje Moers (project manager at LLInC), Anthea Aerts (educational consultant at LLInC), Erwin Veenstra, and Arian Kiandoost (developers/data analysts, LLInC) collaborated on the development and subsequent small-scale pilot of the FET (Formative Evaluation of Teaching).
The FET is designed to be more conducive for the improvement of teaching practices (formative, qualitative) and less focused on mere assessment of teaching (summative, quantitative). Like the traditional SET, the FET is fast, efficient and relatively inexpensive. However, the FET aims to give teachers clearer directions and qualitative input on how to improve their teaching.
In the first step of the FET survey (Figure 1), students are presented with a list of course aspects on which they can give feedback. Some of the aspects on the list are general (e.g., the methods of assessment), while some of them can be course-specific (e.g., learning objectives). Note that the course aspect pertaining to the teacher specifically asks students to direct feedback on their didactic approach. As noted, students’ evaluations of teaching can sometimes be prone to unconstructive abusive comments. By explicitly asking students to focus on the didactic approach, we hope to discourage these type of undesirable and unconstructive comments.
From the list of aspects, students are asked to select just one or two key aspects which they appreciated (i.e., tops), and one or two key aspects which they think could be improved upon (i.e., tips). With this design feature, we hope counter the threat of evaluation fatigue that is more likely to occur in more comprehensive surveys like the traditional SET that require students to evaluate each and every aspect of a course.
In the second step (Figure 2), after selecting one or two aspects as tips and tops, students are asked to write a short motivation for their respective selections. This set-up allows students to share their insights in a fast, efficient, and meaningful way.
After a given course has been evaluated, the FET output provides teachers with a ranking of aspects that were selected most frequently. Because selected aspects have also been enriched with qualitative textual input from students, teachers can engage in a focused review of those student contributions that are most relevant for improving their course (i.e., comments on aspects that were selected most frequently).
Going over the FET evaluation results should be a relatively straightforward task for those who teach small classes. However, for teachers with larger classes we anticipated that this could be a considerable burden. This is where AI comes into play. LLInC developer Erwin Veenstra and data analyst Arian Kiandoost worked on a way of complementing the raw data with an AI-generated summary of the main results. Specifically, we wanted to select a Large Language Model (LLM) that was capable of performing the task of summarizing the data in such a way that it is easy to process and interpret. We expected that, with the current level of sophistication of available LLMs, it should be possible to generate a high-quality descriptive summary of the qualitative data. It took a fair amount of experimentation, iteration, and team discussion about different possible LLMs, output formats, and the “right” prompt before we arrived at a model and approach capable of performing the task.
The LLM we ended up using was OpenAI’s GPT-4 API (OpenAI Platform, n.d.). Note that, in contrast to the non-API consumer service ChatGPT, the OpenAI API does not have the same privacy and security issues. That is, data sent to the OpenAI API is not used to train or improve the model. Still, because we ended up using a cloud-based LLM, the data were first anonymized before feeding it to the LLM. Also, we rearranged the survey data into a JavaScript Object Notation (JSON) format (JSON, n.d.) to make it easier for the LLM to group information per course aspect. The LLM was prompted in such a way that it recognized comments were grouped per course aspect, and that differences in magnitude should also be expressed in the summary (i.e., one Tip versus 10 Tops should not carry the same weight). Furthermore, we prompted the LLM to generate one synthesized integrated summarization for the tips and tops per course aspect. We found that this way of reporting helped to make explicit and nuance apparent contradictions in the data (e.g., half of the students stating one thing, the other half stating the opposite). After the summary was generated, any omissions in the output due to anonymization would be transformed back into the original values in the final report.
In the AI-generated summary, course aspects are presented in a descending order starting with the one that was selected most frequently. For each aspect, a short summary is generated to capture the overall gist of the student comments. Figure 3 shows a screenshot of an AI-generated summary and for one aspect, the working groups, of a course. Note that the summary only gives a descriptive synthesis of the students’ positive and negative, but the actual interpretation is left to the teacher. As is common knowledge, LLMs can sometimes be prone to “hallucinations”. We noticed that prompting the model to also provide an interpretation of the data, beyond what was in the text, increased the occurrence of hallucinations and also decreased the degree of reproducibility of the results. However, a simple more bare-bones LLM-generated descriptive summary provided what we felt was an accurate and reproducible representation of the data. To be sure, we prompted the LLM to supplement each summary with up to six “representative examples” (i.e., actual instances of student feedback) of tips and tops as a reference to the actual data. Furthermore, in the introduction text of to the AI-generated report, we encouraged teachers to cross-check with the actual raw data that was provided along with the summary, in case doubts would arise about the reliability.
In the past couple of months, the FET has been piloted in different contexts at Leiden University, ranging from small-group settings such as an elective master class course (+-20 students) to a large-group setting such as a BA course (200+ students). The feedback from the participating teachers has been overwhelmingly positive. All teachers indicated wanting to use the FET again in the future and in their interactions with us, they were able to give multiple concrete examples of changes they intended to make in future iterations of their course. Based on the large BA course, the median time it took students to fill out the survey was around 2 minutes and 40 seconds, a duration we consider not to be too much of a burden for the students. Compared to the regular SET survey from a previous cohort, the FET survey produced much more qualitive student feedback in terms of the total number of student comments. Furthermore, although the average word count per comment that not differ much between the SET and the FET, students filling out the FET clearly put more effort into comments specifically directed at improving the course (i.e. Tips). Most important, after receiving and discussing the report, the participating teacher indicated having a high-degree of confidence in the reliability of the AI-generated summary based on cross-checking with the raw data. In short, the preliminary results of our small scale pilot suggest that the FET can be a valuable tool for efficient collection of high-quality student feedback that is formative and more conducive to the improvement of teaching practices.
Outreach activities (workshops and presentations about the FET project) have now spiked the interest in the FET project within the university. In the next phase, we hope to get further support and funding to scale up the project and see if we can replicate our findings in a broader range of contexts and faculties. Also, for future direction, we aim to use an LLM that can be run on a local server (e.g., Mistral AI, n.d., Meta-Llama, n.d.). To run the larger versions of these kind of models, we need a more powerful computer than the one we had access to during the current project. However, such a machine has recently become available at LLInC.
As the project enters the next phase, we aim to investigate how the FET survey can be successfully implemented to improve educational design and how it can support teachers professional development activities. Furthermore, in our future endeavors we plan to also take into account the student perspective. This was outside the scope of the current project, but it is vital to consider the student perspective if the project is going to move forward and scale up.
Lastly, In the FET we purposefully chose to collect only qualitative data. As already noted abusive comments can sometimes enter into qualitative evaluation data and this can cause stress and anxiety among teachers. However, the qualitative evaluation data from our small-scale pilot did not seem to contain any student comments that could be considered abusive. Perhaps this was due to the design of the FET and the phrasing of the aspects in the list from which students could choose. Or perhaps it was simply due to the fact that students were aware that they were participating in a pilot project. However, even if abusive comments would enter into the FET, we expect that the LLM should be capable of filtering out such unconstructive comments. This is one thing that we would also want to test in the future (e.g., by contaminating evaluation data with a preconstructed set of abusive comments, and training the model to filter the data).
In conclusion, we believe the FET allows teachers to collect valuable feedback on the efficacy of their teaching in a fast, efficient, and meaningful way. Furthermore, the FET holds the potential for enhancing and enriching existing teacher professionalization activities as it can facilitate critical reflection on one’s own teaching practice.
References
Harrison, R., Meyer, L., Rawstorne, P., Razee, H., Chitkara, U., Mears, S., & Balasooriya, C. (2022). Evaluating and enhancing quality in higher education teaching practice: A meta-review. Studies in Higher Education, 47, 80-96.
Heffernan, T. (2022). Sexism, racism, prejudice, and bias: A literature review and synthesis of research surrounding student evaluations of courses and teaching. Assessment & Evaluation in Higher Education, 47, 144-154.
Hendry, G. D., & Dean, S. J. (2002). Accountability, evaluation of teaching and expertise in higher education. International Journal for Academic Development, 7, 75-82.
Hounsell, D. (2003). The evaluation of teaching. In A handbook for teaching and learning in higher education (pp. 188-199). Routledge.
Reynolds, D. V. (1977). Students who haven’t seen a film on sexuality and communication prefer it to a lecture on the history of psychology they haven’t heard: Some implications for the university. Teaching of Psychology, 4, 82–83.
Stroebe, W. (2020). Student evaluations of teaching encourages poor teaching and contributes to grade inflation: A theoretical and empirical analysis. Basic and applied social psychology, 42, 276-294.
Uijtdehaage, S., & O’Neal, C. (2015). A curious case of the phantom professor: mindless teaching evaluations by medical students. Medical Education, 49, 928-932.
Uttl, B., White, C. A., & Gonzalez, D. W. (2017). Meta-analysis of faculty’s teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation, 54, 22-42.
Leiden University hosted two one-day conferences on second language learning within just three weeks: the Language Learning Resource Centre’s (LLRC) yearly conference, this time on AI in language education, was held on June 13th. On July 1st there was a one-day conference on rhythm and fluency in second language speaking, as satellite workshop alongside the 2024 Speech Prosody conference. As co-organizer of these one-day events, I got to experience both maximally. It made me realize once again the breadth and wealth of language learning research.
LLRC conference on AI in language education
For the 7th edition of the LLRC conference, presenters from Universities, Universities of Applied Sciences, a secondary school, and a global testing company from The Netherlands and Flanders showed their research and good practices to a crowd of researchers and language teachers from a similar wide range of different institutes, schools, and companies. With 110 participants, it was the busiest edition so far. Apparently, the theme of the day struck a note. Although AI has been around since the 50’s, it is only in the last two years that LLMs and intelligent tools are quickly and drastically changing language use, language learning, and language teaching. When the LLRC committee came up with the timely theme in the fall of 2023, we did not yet know it, but it turned out that June ‘24 was proclaimed as month of AI in education in the Netherlands, with many events throughout the country, and the one by LLRC was linked to it.
AI is not artificial and it is not intelligent
The keynote speaker, Esther van der Stappen (Avans University of Applied Sciences), introduced the theme by bringing together multiple perspectives and bridging the gap between computer science and education, shedding light on practical and ethical aspects of AI in education more in general. She certainly had our attention when she explained what AI is not: it is not artificial and it is not intelligent, to begin with.
Bastogne cookies and second language learning in both conferences
A little over a fortnight later, the one-day conference on prosodic features of learners’ fluency featured two keynote speakers: Lieke van Maastricht (Radboud University) and Malte Belz (Humboldt University, Berlin). Lieke showed how the use of hand-gestures during speaking in a second language should not be ignored, and Malte showed methodological issues that need to be addressed in research on measures of fluency in speaking such as pausing and filled pauses. This conference had some similarities to the LLRC conference, but the differences were more pronounced. To start with the similarities: on both days, technological advancements and tools played a big role; on both days, Bastogne cookies were served during coffee breaks and vegetarian bitterballen at the end of the day. Obviously, the biggest resemblance between both conferences concerned the broad topic of second language learning. But where the day on AI in the language classroom showcased research and practice on language teaching didactics (using AI) in the broadest sense, the conference on prosody and fluency showed research and research methods on very specific aspects that language learners need to master, namely on hesitations and rhythm in speaking.
Making worlds meet
To advance research and practice of language teaching, both types of exchanges among researchers and teachers are helpful, and in the end, we should strive to have both types of worlds meet. For instance, it is one thing to find out that gestures like arm movements should accompany the word or sentence stress, and that second language learners have trouble in timing gestures in this way; it is another thing to teach the timing of gestures to a classroom of 30 students. And indeed, in the closing session on July 1st, three “D”’s were recognized as Big Questions or New Challenges to tackle within the subfield of prosody in linguistics: 1) Differences between individuals in speaking and learning to speak, 2) the Dynamic nature of speaking processes, and, as third “D”: Didactics. So plans to have both types of worlds meet (more often) are already in the making!
Many first-year higher education students experience the transition from secondary to higher education as challenging. To facilitate this transition, universities offer mentoring programs. How can such a mentoring program be designed in an effective way? This literature overview outlines effective ingredients of mentoring programs.
How does mentoring help to foster student success?
Mentoring can be defined as “a formalized process based on a developmental relationship between two persons in which one person is more experienced (mentor) than the other (mentee).” (Nuis et al., 2023, p. 7). Based on a synthesis of the literature, I have developed the following conceptual model that relates mentoring to student success.
The relationship between mentoring and academic success can be explained through three mediating factors: academic integration, social integration, and psychosocial well-being (Lane, 2020). Academic integration involves academic knowledge and skills (Crisp & Cruz, 2009), career path development (Crisp & Cruz, 2009), and student identification with the norms of the university and their field of study (Tinto, 1975). Academic integration ensures that the student is committed to the goal of successfully completing their studies, thereby lowering attrition (Tinto, 1975). Social integration involves sense of belonging, with peers and within the wider university community, and the ability to find one’s way within the university (Lunsford et al., 2017; Tinto, 1975). This type of integration is also hypothesized to reduce attrition (Tinto, 1975). Psychosocial well-being involves issues such as stress, resilience, self-efficacy, and motivation (Law et al., 2020).
Does mentoring work?
Many studies show that mentoring is effective in increasing student success (Andrews & Clark, 2011; Campbell & Campbell, 1997; Crisp & Cruz, 2009; Eby et al., 2008; Gershenfeld, 2014; Jacobi, 1991; Lane, 2020), and studies confirm that the mechanism through which mentoring is effective operates through the mediating factors (Lane, 2020; Lunsford et al., 2017; Webb et al., 2016). However, effects seem to be generally small (average effect size .08; Eby et al., 2008).
What are characteristics of effective mentoring?
Below, I discuss characteristics of effective mentoring programs for which evidence is available. It is important to keep in mind that it is the combination academic, social, and psychosocial support that makes a mentoring program effective (Lane, 2020). Among the characteristics of effective mentoring, the role and person of the mentor stands out: This appears the most important ingredient of any mentoring program.
Type of mentor: peer or teacher
The two main types of mentors are senior peers and faculty members. Peer mentors may be more suitable for providing social and psychosocial support (Leidenfrost et al., 2017), may be more available and approachable, and therefore easier to confide in (Lunsford et al., 2017). However, for academic integration and academic success peer and teacher mentoring appear equally effective (Lunsford et al., 2017).
Mentor characteristics
What are attributes of effective mentors? I discuss a number of them:
Helpfulness: The mentor’s helpful and empowering attitude makes a significant contribution to the psychosocial well-being of a mentee (Lane, 2020). Mentees even prefer a mentor who considers their needs and helps them make choices above an empathetic mentor (Terrion & Leonard, 2007).
Role model and openness: Mentors must be able to act as role models and reflect on their own experiences and challenges (Holt & Fifer, 2018). Furthermore, if a mentor can open up to the mentee in a healthy way, and sees the relationship as a joint learning process, this can lead to a relationship in which there is room for growth (Terrion & Leonard, 2007).
Self-efficacy: Mentor’s self-efficacy seems to be an important predictor of perceived support. Careful selection, training and guidance of mentors helps to ensure appropriate self-efficacy of mentors (Holt & Fifer, 2018).
Availability and approachability: Sufficient availability and good approachability of mentors leads to higher satisfaction, both for mentors and mentees (Ehrich et al., 2004; Terrion & Leonard, 2007).
Experience with mentoring: It seems that mentors do not need to have previous experience as mentors (Terrion & Leonard, 2007).
Type of activities
There is some evidence about which activities have proven effective. Social integration is promoted by facilitating contact between fellow students and with the mentor, by encouraging conversation and discussion, by exchanging ideas and experiences, and by supporting mentors and fellow students in problem-solving (Ehrich et al., 2004). Providing constructive feedback, and avoiding judgmental feedback, fosters academic integration and psychosocial well-being (Ehrich et al., 2004; Leidenfrost et al., 2011). Academic integration can be promoted by helping students to interpret and respond to feedback (Law et al., 2020), in self-regulated learning in general, in writing, and in exam preparation (Andrews & Clark, 2011; Holt & Fifer, 2018). This latter type of support requires tacit knowledge, which peer mentors can share from first-hand.
Duration & frequency
There is no consistent evidence about the duration and frequency of meetings (Lane, 2020). On the one hand, more contact between mentor and mentee leads to more perceived support (Holt & Fifer, 2018; Andrews & Clark, 2011) and higher student success (Campbell & Campbell, 1997). On the other hand, if the mentee is satisfied with the mentor’s support, the time the mentor spends with the mentee does not lead to more mentee satisfaction (Terrion and Leonard; 2007). So while the quantity of contact is important, the quality of contact appears equally significant.
Conclusion
Based on the literature, the conclusion seems justified that the person of the mentor and how the mentor fills the support are the most important ingredient of any mentoring program: A helpful and open mentor who is approachable and able to empower the mentee can be a powerful source for effective mentoring.
References
Andrews, J., & Clark, R. (2011). Peer mentoring works! Aston University.
Bandura, A. (1997). Self-efficacy: The exercise of control. Freeman.
Campbell, T. A., & Campbell, D. E. (1997). Faculty/student mentor program: Effects on academic performance and retention. Research in Higher Education, 38(6), 727-742. https://doi.org/10.1023/A:1024911904627
Crisp, G., & Cruz, I. (2009). Mentoring college students: A critical review of the literature between 1990 and 2007. Research in Higher Education, 50(6), 525-545. https://doi.org/10.1007/s11162-009-9130-2
Eby, L. T., Allen, T. D., Evans, S. C., Ng, T., & DuBois, D. L. (2008). Does mentoring matter? A multidisciplinary meta-analysis comparing mentored and non-mentored individuals. Journal of Vocational Behavior, 72(2), 254-267. https://doi.org/https://doi.org/10.1016/j.jvb.2007.04.005
Ehrich, L. C., Hansford, B., & Tennent, L. (2004). Formal mentoring programs in education and other professions: A review of the literature. Educational Administration Quarterly, 40(4), 518-540. https://doi.org/10.1177/0013161×04267118
Gershenfeld, S. (2014). A review of undergraduate mentoring programs. Review of Educational Research, 84(3), 365-391. https://doi.org/10.3102/0034654313520512
Holt, L. J., & Fifer, J. E. (2018). Peer mentor characteristics that predict supportive relationships with first-year students: Implications for peer mentor programming and first-year student retention. Journal of college student retention : Research, theory & practice, 20(1), 67-91. https://doi.org/10.1177/1521025116650685
Jacobi, M. (1991). Mentoring and undergraduate academic success: A literature review. Review of Educational Research, 61(4), 505-532. https://doi.org/10.3102/00346543061004505
Lane, S. R. (2020). Addressing the stressful first year in college: Could peer mentoring be a critical strategy? Journal of College Student Retention: Research, Theory & Practice, 22(3), 481-496. https://doi.org/10.1177/1521025118773319
Law, D. D., Hales, K., & Busenbark, D. (2020). Student success: A literature review of faculty to student mentoring. Journal on Empowering Teaching Excellence, 4(1), 22-39.
Leidenfrost, B., Strassnig, B., Schabmann, A., Spiel, C., & Carbon, C.-C. (2011). Peer mentoring styles and their contribution to academic success among mentees: A person-oriented study in higher education. Mentoring & tutoring, 19(3), 347-364. https://doi.org/10.1080/13611267.2011.597122
Lunsford, L. G., Crisp, G., Dolan, E. L., & Wuetherick, B. (2017). Mentoring in higher education. The SAGE handbook of mentoring, 20, 316-334.
Nuis, W., Segers, M., & Beausaert, S. (2023). Conceptualizing mentoring in higher education: A systematic literature review. Educational Research Review, 41, 100565. https://doi.org/https://doi.org/10.1016/j.edurev.2023.100565
Terrion, J. L., & Leonard, D. (2007). A taxonomy of the characteristics of student peer mentors in higher education: findings from a literature review. Mentoring & Tutoring: Partnership in Learning, 15(2), 149-164. https://doi.org/10.1080/13611260601086311
Tinto, V. (1975). Dropout from higher education: A theoretical synthesis of recent research. Review of Educational Research, 45(1), 89-125. https://doi.org/10.3102/00346543045001089
Webb, N., Cox, D., & Carthy, A. (2016). You’ve got a friend in me: The effects of peer mentoring on the first year experience for undergraduate students Paper presented at the Higher Education in Transformation Symposium, Oshawa, Ontario, Canada
If you’re a researcher, you probably have conducted literature reviews or will do so in the future. Depending on your keywords, a search in online databases easily results in several hundred or even thousands of hits. One of the most time consuming steps is screening all those titles and abstracts to determine which articles may be of interest for your review. Isn’t there a way to speed up this process? Yes, there is!
Recently, we started a literature review about the use of Open Educational Resources (OER) in K12-education. We came across a great tool for article screening. In this blog, we will introduce this tool and share our experiences.
Training the system
Researchers at the University of Utrecht have developed an open source and free screening tool to help researchers go through the enormous digital pile of papers: ASReview LAB (see www.asreview.nl). In a 2-minute introduction video, you can learn how it works. Basically, the programs helps you to systematically review your documents faster than you could ever do on your own by, as they put it, “combining machine learning with your expertise, while giving you full control of the actual decisions”. We just had to try that!
First, we made sure the papers we’ve found all included a title and an abstract as that is what we would use to screen them on relevance. It was very easy to import our RIS file (from Zotero in our case, but can be from any reference management system) with all the hits from our search query. Then it was time to teach ASReview! We provided the system with a selection of relevant and irrelevant articles which it uses to identify potential matches, thus expediting the screening process. Following the guidelines provided in the ASReview documentation, we utilized the default settings of the AI model.
The researcher as the oracle
Once the system was trained, the screening phase could start. At each stage, we evaluated whether a document was relevant or not, providing notes to justify the decisions. In cases of uncertainty, where the abstract alone was not sufficient to make a judgment, we referred to the full text of the article. With each decision, ASReview adapts its learning model to ensure that as many relevant papers are shuffled to the top of the stack. That’s why it is important to make the ‘right’ decision. We worked in the ‘Oracle Mode’ (other modes are possible as well, but for reviews this is the best) which makes the researcher ‘the oracle’. ASReview describes the relevance of taking your time to make decisions: “If you are in doubt about your decision, take your time as you are the oracle. Based on your input, a new model will be trained, and you do not want to confuse the prediction mode.” (ASReview, 2023). So make sure that you carefully formulate your research questions and inclusion criteria before beginning to screen the articles. This helps to decide if an article might be of interest or not.
To avoid endless manual screening (which is kind of the point of using this tool), it was recommended to formulate a stop rule. To formulate our stop rule we made use of the recommendations provided by ASReview and Van de Schoot et al. (2021). According to our rule, screening would cease once at least 33% of the documents were reviewed AND ASReview presented 25 consecutive irrelevant items. This approach helped prevent exhaustive screening while maintaining rigor and reliability. A tip for maintaining focus is to spend a limited amount of time per day screening articles (for example a maximum of two hours a day). Throughout the screening process, ASReview’s dashboard provided a visual overview of progress and decisions made.
In total, 460 items were excluded by the system, while 324 were manually screened, with 173 rejected for various reasons. These reasons ranged from focusing on specific educational technologies to addressing broader educational issues beyond the scope of the study. To ensure the reliability of the screening process, a second researcher independently assessed a random sample of 10% of the documents.
Combining AI and human judgment
After completing the screening process, it is very easy to download a file with an overview of all the decisions made, including both relevant and irrelevant articles. The dashboard and the output files help you in reporting why certain articles were excluded from the review. Notably, the PRISMA model already accommodates for articles excluded through AI. So, in conclusion, ASReview offers a powerful solution for streamlining the literature review process, leveraging AI to expedite screening while maintaining the integrity of the review. It combines the efficiency of AI with human judgment, saving you time – something welcomed by all.
Van de Schoot et al. (2021). An open source machine learning framework for efficient and transparent systematic reviews. Nature Machine Learning, 3, 125-133. https://doi.org/10.1038/s42256-020-00287-7
This month, the 10th Diversity & Inclusion Symposium took place, organized by the Diversity & Inclusion Expertise Office and the Faculty of Archaeology. For the small ICLON delegation that attended, the event highlighted how the questions, challenges and opportunities we face are not dissimilar to those experienced by colleagues elsewhere in our organization.
After 10 years of D&I policy, it has become clear that addressing equity, diversity and inclusion in meaningful and impactful way remains challenging. The symposium’s plenary speakers highlighted how the university is a place not just for research and knowledge sharing, but also where students and staff must learn to navigate complex and conflicting conversations. At ICLON, this is a topic that has been very much on our mind lately, as researchers, but also as teacher educators and trainers, and as an organization more broadly. How can ICLON research keep addressing these challenges? And what aspects of research and education should be emphasized in in order to contribute to an inclusive society?
Untold stories and questioning
The theme of the symposium was “Untold Stories.” In her opening keynote, Dr Valika Smeulders from the Rijksmuseum demonstrated how the museum navigates complex conversations effectively using heritage and fragile pasts. She explained about breaking existing frameworks and dominant narratives through multi-perspectivity and personal stories. In times of polarization, heritage can function to facilitate an open dialogue but also be a trigger for a heated debate.
This notion underpinned our recent research published in a history education journal. Collaborating with the Rijksmuseum van Oudheden, we developed a training for history teachers on addressing sensitive topics. Using concrete heritage objects and varied questions, teachers created space for students to share their perspectives and tell their stories. Following the intervention, teachers felt better-equipped to navigate such conversations in their classrooms, as observed in lessons addressing contentious issues like “Zwarte Piet”. Students and teachers were stimulated to ask questions. Certain questions can ‘heat up’ cooled down issues and hot topics can be ‘cooled down’ by questioning and not focusing on ‘the right answer’.
Maintaining such dialogue and continuing to question can be difficult. In a workshop at the same symposium, by Ruben Treurniet from Civinc, participants engaged with each other anonymously using a tool that connects individuals with differing views. Through an online chat session, we exchanged thoughts on statements like “Debates about academic freedom should also involve the possibility of defending the right to not be inclusive.” A slight majority disagreed with this statement. The app encouraged us to ask each other questions, and provided an intriguing opportunity to converse with someone outside of one’s usual ’bubble’.
These anonymous discussions can foster some form of connection, and can be a useful tool in developing mutual understanding. In our professional context, however, we do not generally communicate through anonymous chat, but through face-to-face encounters, with their accompanying tone, body language and emotional load. Conversations on controversial topics can become tense and confrontational, and can actually reinforce relationships of power and dominance. Explicitly expressing feelings, doubt and judgments can also be also daunting for educators and researchers expected to exude authority, or who are anxious about repercussions if they do not maintain a ‘neutral’ standpoint. However, it is important that we, as researchers and educators, demonstrate the art of doubt and model how to deal with uncertainty.
Interdisciplinarity and positionality
Finally, it may be beneficial to revisit certain cooled-down topics to practice interdisciplinary thinking and multi-perspectivity. A historical perspective, as shown by Valika Smeulders, can offer various narratives, demonstrating how history is a construct accommodating diverse viewpoints. An issue that is ‘hot’ in the present could be ‘normal’ in the past and vice versa. Looking beyond your own time and discipline can be inspiring and helpful. Collaborating across disciplines broadens perspectives while requiring us to clarify our own viewpoint through questioning and being questioned. At the moment, this principle is being applied in ongoing research with history and biology teacher trainees.
Other current projects at ICLON are exploring culturally sensitive teaching, linguistic diversity, approaches to inclusion, and teacher, student teacher and teacher educator perspectives on equality, equity and social justice. These sensitive areas of research can create vulnerable situations for participants and researchers alike. They demand researchers’ critical awareness of their positionality, grappling with their values and giving space to non-dominant perspectives, while also contributing to authoritative knowledge and relevant practical applications.
Perhaps interdisciplinary and positionality could be a theme for a future symposium, bridging the diverse perspectives, experiences and expertise at ICLON and the university more widely. We could show what ICLON can offer regarding questioning, dealing with discomfort and interdisciplinarity, and open space for further dialogue at our university.
Logtenberg, A., Savenije, G., de Bruijn, P., Epping, T., & Goijens, G. (2024). Teaching sensitive topics: Training history teachers in collaboration with the museum. Historical Encounters, 11(1), 43-59. https://doi.org/10.52289/hej11.104
Recently NWO release a new funding call for educational innovation projects, labelled “Scholarship of teaching and Learning”. This is an interesting funding opportunity for academics who would like to strengthen their teaching. Academic teachers can apply for funds to put their innovative teaching ideas into practice. And indeed this is a good opportunity to get your funding for those teaching ideas you have been waiting to implement. This also is the time to re-think your teaching and teaching ideas and put them to the test.
Last week I visited Germany for my study around the climate crisis and the issue of hope. One German student said: Can’t we as teachers just tell students to become vegetarians to save the planet?
What do you think, would it be a solution, or wise, to tell students what to eat, drink, vote, do or think in order to bring about change? I mean, shouldn’t we do something as we know that studies about the effects of climate change on young people reveal that pessimism, guilt, hopelessness and fear are common in the new generation?
Bringing about change in times of the many present day crises with all the doom stories and anxiety is an interesting, yet challenging research topic. Interestingly, precisely in the midst of complex crises, those who provide education have a crucial role: to make the new generation appear to the world as powerful and innovative (Arendt, 1958). To not reinforce fear or impose what to do or think, but have the new generation discover from hope that a different future is possible and that even a crisis includes profound problems-though complex and intractable- for which solutions can be found. A focus on hope is key!
Hope as a construct has received attention from many different angles, such as psychology, theology, philosophy and recently even famous primatologist and anthropologist Jane Goodall (2021). Yet, although many authors endorse the need and importance of hope, to date there has been little innovation in the ways in which hope can have a practical impact and lead to change, let alone in education. In my research project, hope has been incorporated into a pedagogy of hope. It holds several powerful design principles for a pedagogy of hope stemming from pilots in teacher education institutes in both the Netherlands and Germany and is now tested in the context of the climate crisis. Around this climate crisis, pre-service teachers are known to feel very committed to teaching the topic, but at the same time concerned and anxious about the climate themselves and ignorant in how to provide hopeful and effective teaching about the climate crisis in their secondary school internship classes (Bean, 2016).
The pedagogy of hope was implemented in a Dutch and German teacher education institute. The preliminary outcomes show that participants were able to formulate specific intentions that are both directed toward hope for the climate as well as easy to implement in their actual teaching in secondary education. Also, many intentions show to be action-oriented and participants often used their creativity to find non-traditional ways of conveying climate hope. We also found hindrances for teaching hopefully, such as not enough time, curriculum coverage and a lacking attention in textbooks for climate change and climate hope. Also, the different opinions that others could have could make it a controversial issue to teach in school.
On to the next steps!
Michiel Dam, researcher at ICLON, LTA teaching fellow
Multiple choice (MC) testing is generally viewed as an efficient way of assessing student knowledge. Up until today it has been a very popular assessment format in higher education. Especially in courses where a large number of students are enrolled, the MC format is often used as it allows for fast and reliable grading. Not surprisingly, as an educational consultant and teacher trainer much of my work has revolved around supporting teachers in creating and/or improving their MC assessments. Throughout the years, I have taught numerous training sessions on Testing and Assessment for the University Teacher Qualification at Leiden University. On the one hand these training sessions are designed to teach best practices, but at the same time the sessions are also designed to cater to teacher needs. As such, a large part of the sessions is focused on giving teachers instructions and tips on how to create good MC questions. To be sure, I have always managed to squeeze in some discussion on the downsides and many limitations of MC testing as well. But still… It always kept me feeling a bit uneasy. In giving the instructions that the program compelled me to, I might have inadvertently been endorsing this practice more than I would have wanted. Thus, this blogpost will be as much repentance as it is a cautionary exposition about some of the negative consequences that MC testing can have on student learning.
There are multiple reasons for why MC exams could be considered as detrimental for student learning. For instance, one often heard criticism is that the recognition-based MC exam format will often result in students preparing for exams in a superficial way. Furthermore, one could argue that the ecological validity of MC exams is low and not representative of real-world situations. Also, the MC test format is by design not suitable for assessing higher levels of learning. These kind of objections are well-known and they have also received considerable attention in the University teacher Qualification courses on testing and assessment taught at Leiden University. I am not going to reiterate them extensively in this blogpost. Instead, I will discuss one particularly negative consequence of MC testing that I think is often neglected: the misinformation effect.
The misinformation effect
Before we consider the misinformation effect in the context of MC testing, we will first take a step back and consider some general research on the workings of human memory and how misinformation can result in misremembering. One of the first general demonstrations of the misinformation effect was provided by Loftus & Palmer (1974). In Experiment 2 of their seminal study, participants watched a short video clip of a car accident. After watching the video, participants were asked to give an estimate of the speed of the cars that were involved in the accident. Half of the participants were asked to estimate the speed of the cars when they smashed into each other, while the other half of participants estimated the speed for when the cars hit each other. The subtle change of the verb used in the question resulted in a difference in the reported speed: Participants estimated the speed to have been higher when they were in the smashed condition. More importantly, one week after giving the speed estimates, participants returned and were asked to indicate whether they remembered seeing broken glass in the video. Interestingly, participants in the smashed condition were much more likely to report having seen broken glass even though there was none to be seen in the video.
The results from the Loftus & Palmer study are often cited in the context of the reliability of eye-witness testimonies (and the effects that leading questions can have on misremembering). More importantly, the results are also taken as evidence in support for the idea that human memory is reconstructive in nature. During the retrieval of information from memory we reconstruct what we have previously experienced. When previously exposed to some form of misinformation, the process of reconstruction can result in substantial misremembering of previous experiences.
The misinformation effect in the context of MC questions
In the Loftus & Palmer (1974) study, the degree to which participants were exposed to misinformation was rather subtle (i.e., a small change of verb in the leading question). However, if we now consider the situation of an MC exam, the degree of exposure to misinformation seems much more extreme. A typical MC question will often have four alternatives for students to choose from of which the majority (usually three) is incorrect. Thus, by using MC exams, we are intentionally exposing our students to misinformation. MC exams are designed to do just that. Surely, you could argue that the negative consequences of MC exams might be less severe, because students are aware that they are being exposed to misinformation. They are going into the exam expecting this. However, in preparation of the exam, the teacher has also taken careful consideration of phrasing erroneous answers in such a way that they are plausible. Teachers are instructed to formulate alternatives that students are likely to mistakenly select as the correct one. By exposing students to misinformation in the context of MC exams, teachers might very well be sacrificing student learning for the sake of fast and reliable grading.
In a later study by Roediger & Marsh (2005) the consequences of MC testing on student learning was investigated. In their experiment, participants studied short prose passages (or not) and were subsequently tested on the materials (or not) using MC questions with a number of alternatives ranging from 2 – 6. One week later participants returned and received an open-ended short answer test. Going into the test, participants were also given explicit instructions not to guess. First of all, the results on the 1-week test showed that the consequences of MC testing were not all bad: Taking a MC test increased the retention of (correct) information. This finding, also referred to as the testing effect, is well-established in the literature and has often been replicated across different test formats and settings (e.g., Rowland, 2014). On the other hand, however, being exposed to misinformation in the MC test, also increased the production of erroneous answers on the 1-week short answer test. The degree to which participants produced erroneous (MC) answers tended to increase as the number of alternatives of the MC test increased. Note that this was the case even though participants had received explicit instructions not to guess on the short answer test. Clearly, the misinformation effect is not just relevant in the context of eye witness testimonies, but also in the context of assessment in higher education. MC exams can have an adverse effect on student learning in the sense that students can mistakenly recall incorrect answer options at a later point in time. Later research (Butler & Roediger, 2008) has shown that the misinformation effect as a result of MC testing can be reduced by giving students direct feedback (either after each individual question or after taking an entire test). However, in my experience, summative MC exams in higher education usually don’t provide immediate feedback to students. In the absence of corrective feedback, students might stay under the impression that their erroneous responses on a test were correct.
To end on a positive note, there are promising alternatives for MC exams that teachers are exploring. For instance, at the Leiden University Medical Centre (LUMC) some teachers have started using Very Short Answer Questions (VSAQs) on exams as a substitute for MC questions. Among others, dr. Alexandra Langers (Leiden University Teaching Fellow), and her PhD student Elise van Wijk have started investigating the consequences of VSAQ exam format. VSAQs require students to generate short (1 – 4 word) answers to exam questions. Compared to MC questions, VSAQs require retrieval of correct answers rather than simple recognition and as such these type of questions can be more conducive for student learning. Because answers are short, VSAQs will still allow for some degree of automatic scoring (for some predetermined “correct” responses). This can keep grading time acceptable even for teachers with large classes. Some of the findings of the VSAQ research project have recently been published in an article in PLOS ONE. Replicating previous findings (Sam et al., 2018), van Wijk et al., (2023) demonstrate that VSAQ exams can have added benefits over MC tests in terms of higher reliability and discriminability. In addition, van Wijk at al. found that the average grading time per individual VSAQ was around two minutes. This seems very acceptable considering the cohort in the study consisted of more than 300 students. Hopefully, initiatives like the one at LUMC will pave the way for other teachers to start using assessment types that can be more supportive of student learning.
References
Butler, A. C., Roediger, H. L. (2008). Feedback enhances the positive effects and reduces the negative effects of multiple-choice testing. Memory & Cognition, 36, 604–616. https://doi.org/10.3758/MC.36.3.604
Loftus, E. F., & Palmer, J. C. (1974). Reconstruction of automobile destruction: An example of the interaction between language and memory. Journal of Verbal Learning & Verbal Behavior, 13, 585–589. https://doi.org/10.1016/S0022-5371(74)80011-3
Roediger, H. L., & Marsh, E. J. (2005). The positive and negative consequences of multiple-choice testing. Journal of experimental psychology. Learning, memory, and cognition, 31, 1155–1159. https://doi.org/10.1037/0278-7393.31.5.1155
Rowland C. A. (2014). The effect of testing versus restudy on retention: a meta-analytic review of the testing effect. Psychological bulletin, 140, 1432–1463. https://doi.org/10.1037/a0037559
Sam, A. H., Field, S. M., Collares, C .F., van der Vleuten, C. P. M., Wass, V. J., Melville, C., Harris, J., & Meeran, K. (2018), Very-short-answer questions: reliability, discrimination and acceptability. Med Educ, 52, 447-455. https://doi.org/10.1111/medu.13504
van Wijk, E. V., Janse, R. J., Ruijter, B. N., Rohling J. H. T., van der Kraan J., Crobach, S., de Jonge, M., de Beaufort, A. J., Dekker, F. W., Langers, A. M. J. (2023). Use of very short answer questions compared to multiple choice questions in undergraduate medical students: An external validation study. PLOS ONE, 18, e0288558. https://doi.org/10.1371/journal.pone.0288558
Please, feel free to add comments and questions to the posts by clicking on the title of the post. At the bottom you can put your comments.
Please share the post in your network by clicking on the icon (-s) below the posts.
Recent Comments