Sunday, 13 April 2014
Do league tables really improve test scores?
There is still an argument about whether school league tables, despite their well-known side effects, actually improve the performance of pupils by ‘holding schools to account’. This is despite careful analysis of extensive data collected by Government on all pupils in England via the National Pupil Dataset (NPD) and details can be found at
This research, the latest in a sequence, shows that the uncertainty surrounding value added scores for schools is not only so large that most schools cannot reliably be separated, it is also of hardly any use for parents choosing schools for their children in terms of predicting school performance.
Yet, still there are reports that claim to demonstrate that public league tables do in fact raise pupil performance. So, despite the fact that they appear to be unreliable measures of real school performance, they nevertheless, somehow, manage to improve overall results. We need, therefore, to take the claims seriously, especially when they are published in peer refereed journals. One of the more recent ones to appear by Professor Simon Burgess and colleagues
compares public examination scores trends over time in Wales, where league tables were dropped in 2001, and England which has continued to publish them. The authors conclude that “If uniform national test results exist, publishing these in a locally comparative format appears to be an extremely cost-effective policy for raising attainment and reducing inequalities in attainment.” Of course, schools are about much more than test and exam scores, and this does need to be borne in mind.
Essentially what the authors do is to compare the difference in GCSE results between Wales and England over the period 2002 – 2008 and show that the difference between England and wales increases steadily over time. This compares with the period before 2002 when there were no differential trends over time. The authors are careful to try to rule out causes other than the abolition of league tables in Wales for this trend, testing a series of assumptions and using a series of carefully constructed and complex statistical models, but are left concluding that it is indeed the abolition of the league tables that has placed Wales at an increasing disadvantage compared to England.
As the authors recognise, a major problem with studies of this kind, sometimes rather misleadingly (as is done here) referred to as ‘natural experiments’, that attempts to infer causation from correlated trends across time is that there are so many things that are also changing over the period that one can never be sure that important alternative explanations have not been ruled out. In this note I want to suggest that there are indeed alternative explanations other than the abolition of league tables for this increasing difference and that the authors’ conclusion as quoted above simply is not justified by the evidence. First let me deal with a few ‘technical’ issues that have been overlooked by the authors.
A basic assumption when public examination scores are used for comparisons is that there is an equivalence of marking and grading standards across different examination boards, or at least in this case that any differences are constant over time. Yet this is problematic. When the league tables and associated key stage testing was abolished in Wales, there was no longer any satisfactory way that such common tests could be used to establish and monitor exam standards in Wales where most pupils sat the Welsh Joint Examination Council (WJEC) exams compared to England where only a small minority took the WJEC exams. There is therefore some concern that comparability may have changed over time. The Authors of the paper unfortunately, do not take this problem very seriously and merely state that
“The National Qualifications Framework ensured that qualifications attained by pupils across the countries were comparable during this period”
One of the ways in which the authors might have tested out their ‘causal’ hypothesis is by dividing England into regions and studying comparative trends in each of these, in order to see if in fact Wales really was different, but this does not seem to have occurred to them. The major omission in the paper, however, is that the authors fail to mention that at the same time as the league tables were stopped, so was the testing and because the pupils became less exposed to the tests, they were arguably less well-equipped for the examinations too. This is, admittedly, somewhat speculative, but we do know that in part the ability to do well on tests is strongly related to the amount of practice that pupils have been given, and it would be somewhat surprising if this did not also extend to the public exams. Interestingly, when piloting for the reintroduction of regular testing in Wales took place in 2012, there was evidence that the performance of pupils had deteriorated as a result of not being tested intensively during their schooling. It has also been suggested that this lack of exposure to testing is associated with a relative decline in PISA test scores.
So here we have a very plausible mechanism, ignored by the authors, that, if you believe it to be real, explains the relative Welsh decline in exam results. It may be nothing to do with publishing league tables, but rather to the lack of test practice. Of course, if this is in fact the case it may have useful implications for schools in terms of how they prepare pupils for exams.
I would argue, therefore, that we should not take seriously these claims for the efficacy of league tables. I believe that there is no strong evidence that their publication in any way enhances pupil performance and furthermore that their well-understood drawbacks remain a powerful argument against their use.