Do league tables really improve test scores?
There is still an argument about whether school league
tables, despite their well-known side effects, actually improve the performance
of pupils by ‘holding schools to account’. This is despite careful analysis of
extensive data collected by Government on all pupils in England via the
National Pupil Dataset (NPD) and details can be found at
This research, the latest in a sequence, shows that the
uncertainty surrounding value added scores for schools is not only so large
that most schools cannot reliably be separated, it is also of hardly any use
for parents choosing schools for their children in terms of predicting school
performance.
Yet, still there are reports that claim to demonstrate that
public league tables do in fact raise pupil performance. So, despite the fact
that they appear to be unreliable measures of real school performance, they
nevertheless, somehow, manage to improve overall results. We need, therefore, to
take the claims seriously, especially when they are published in peer refereed
journals. One of the more recent ones to appear by Professor Simon Burgess and
colleagues
compares public examination scores trends over time in
Wales, where league tables were dropped in 2001, and England which has
continued to publish them. The authors conclude that “If uniform national test
results exist, publishing these in a locally comparative format appears to be
an extremely cost-effective policy for raising attainment and reducing
inequalities in attainment.” Of course, schools are about much more than test
and exam scores, and this does need to be borne in mind.
Essentially what the authors do is to compare the difference
in GCSE results between Wales and England over the period 2002 – 2008 and show
that the difference between England and wales increases steadily over time.
This compares with the period before 2002 when there were no differential
trends over time. The authors are careful to try to rule out causes other than
the abolition of league tables in Wales for this trend, testing a series of
assumptions and using a series of carefully constructed and complex statistical
models, but are left concluding that it is indeed the abolition of the league
tables that has placed Wales at an increasing disadvantage compared to England.
As the authors recognise, a major problem with studies of
this kind, sometimes rather misleadingly (as is done here) referred to as
‘natural experiments’, that attempts to infer causation from correlated trends
across time is that there are so many things that are also changing over the
period that one can never be sure that important alternative explanations have
not been ruled out. In this note I want to suggest that there are indeed
alternative explanations other than the abolition of league tables for this
increasing difference and that the authors’ conclusion as quoted above simply
is not justified by the evidence. First let me deal with a few ‘technical’
issues that have been overlooked by the authors.
A basic assumption when public examination scores are used
for comparisons is that there is an equivalence of marking and grading
standards across different examination boards, or at least in this case that
any differences are constant over time. Yet this is problematic. When the
league tables and associated key stage testing was abolished in Wales, there
was no longer any satisfactory way that such common tests could be used to
establish and monitor exam standards in Wales where most pupils sat the Welsh
Joint Examination Council (WJEC) exams compared to England where only a small
minority took the WJEC exams. There is therefore some concern that
comparability may have changed over time. The Authors of the paper
unfortunately, do not take this problem very seriously and merely state that
“The National Qualifications Framework
ensured that qualifications attained by pupils across the
countries were comparable during this period”
One of the ways in which the authors might have tested out
their ‘causal’ hypothesis is by dividing England into regions and studying
comparative trends in each of these, in order to see if in fact Wales really
was different, but this does not seem to have occurred to them. The major omission in the paper, however, is
that the authors fail to mention that at the same time as the league tables
were stopped, so was the testing and because the pupils became less exposed to
the tests, they were arguably less well-equipped for the examinations too. This
is, admittedly, somewhat speculative, but we do know that in part the ability
to do well on tests is strongly related to the amount of practice that pupils
have been given, and it would be somewhat surprising if this did not also
extend to the public exams. Interestingly, when piloting for the reintroduction
of regular testing in Wales took place in 2012, there was evidence that the
performance of pupils had deteriorated as a result of not being tested
intensively during their schooling. It has also been suggested that this lack
of exposure to testing is associated with a relative decline in PISA test
scores.
So here we have a very plausible mechanism, ignored by the
authors, that, if you believe it to be real, explains the relative Welsh decline
in exam results. It may be nothing to do with publishing league tables, but
rather to the lack of test practice. Of course, if this is in fact the case it
may have useful implications for schools in terms of how they prepare pupils
for exams.
I would argue, therefore, that we should not take
seriously these claims for the efficacy of league tables. I believe that there is
no strong evidence that their publication in any way enhances pupil performance
and furthermore that their well-understood drawbacks remain a powerful argument
against their use.