Standard setting for core biomedical science assessments: Oxford’s experience with a new method

Physiology 2012 (Edinburgh) (2012) Proc Physiol Soc 27, SA75

Research Symposium: Standard setting for core biomedical science assessments: Oxford’s experience with a new method

J. Morris1, D. Young2, R. Perera3

1. Department of Physiology, Anatomy & Genetics, University of Oxford, Oxford, United Kingdom. 2. Medical Sciences Division Learning Technologies, University of Oxford, Oxford, United Kingdom. 3. Primary Health Care Sciences, University of Oxford, Oxford, United Kingdom.

View other abstracts by:


In any examination, standard-setting should seek to separate out variance due to the difficulty of the examination (bearing in mind that year’s educational experience), from that due to the performance of the students, which is the aim of the assessment. For many years the Oxford preclinical school has separated the assessment of ‘core’ information and wider reading and understanding. The latter has always been done by essays in which students have a choice. The assessment of core was first done by short answers, but it proved very difficult to achieve consistency of examiners marking and standard setting. From 2004 onward core assessment for each of the three major subjects each year has been done by twenty 5-part questions (either single best answer or extended match format), blueprinted across the range of the syllabus, to determine pass-fail rather than ranking. The questions are delivered on-line by the Question Mark Perception System which not only relieves examiners of tedious marking but also delivers both a numerical mark almost instantaneously, together with a detailed analysis of the performance of the cohort and the questions. The question then was, how to standard-set these questions. At first the practice with the short answers – namely that the students had to get 4 of the 5 parts, in 12 of 15 questions was adopted, but this produced considerable variance in the outcome both within subjects year by year and also between subjects. In the early years, the fact that many new questions were used contributed to this, though with judicious repeated use, the question bank has been upgraded by studying carefully questions that the top performing students were having problems with and removing ambiguities. Following our practice with the short answer core questions an absolute passmark of 70% was first adopted, but examiners often found it necessary to modify this in the light of the performance on particular questions, and between subjects. Many assessments in UK are standard-set by one of the variety of ‘relative’ Angoff, Ebel, or Hofstee methods, but it can be difficult (and very costly) to get together sufficient staff with the appropriate expertise to do this and, in the author’s experience, variance among those participating can be large. We have therefore turned to a method (Cohen-Schotanus & van der Vleuten, 2010, Medical Teacher 32:154-160) which takes as its fundamental premise that the standard of a top cohort of students is consistent from year to year and examination to examination. The absolute standard (70%) is then adjusted to allow for the performance of the top 5% of the cohort (who usually score between 94-100% on our core examination). Analysis of the scores of all students on questions used again from year to year strongly suggests that the fundamental premise of this method is sound. Importantly, we show that the distribution of marks does not follow a normal distribution (although examination data is frequently presented with a mean and standard error for the cohort) but deviates from normal significantly at either end of the distribution. However, we show by a Q-Q analysis that the 2-factor Weibull distribution linearises the Q-Q plot throughout the distribution and particularly at both ends. Using this distribution, one could therefore use any part of the cohort to determine the difficulty of the test. However, for the moment, we have continued to use the performance of the top 5%. Most importantly, understanding the distribution gives us greater confidence at the cut-off at the critical pass-fail level. The particular practical advantages of the use of this method are its ease and speed of use, particularly when combined with presentation and analysis of the test on line. New questions can be tested in summative examinations knowing that if they prove unexpectedly difficult, this is automatically compensated. Most importantly it has given us a rational and defensible way to determine pass-fail cut-offs.



Where applicable, experiments conform with Society ethical requirements.

Site search

Filter

Content Type