Interview

Grade expectations

Interview Guy Sheppard Photographs Jim Varney

Is there an answer to the perennial question of whether exams are getting easier? No, says Jo-Anne Baird, a leading authority on exam comparability, but far greater effort should go into explaining what they do show

Jo-Anne Baird feels uneasy watching the TV reports of teenagers jumping for joy and hugging each other after receiving their GCSE and A-level results. Almost inevitably, she says, their celebrations will be tempered with suggestions that more students are gaining these qualifications because they are easier to pass. "I feel really sorry for those kids who aren't celebrating; not only have they not done well but they're being told that the exams they've taken were too easy anyway."

Concern for what she sees as casualties of the exam system helps to explain Baird's dedication to the research field that she and a small group of academics are focused on.

"There is no question that the assessment system is really important to the country," she says. "Children's experiences of education are really being shaped by it. Researchers can often be working away and their research has no real effect on the world. In assessment, you get to do research that is closely linked to practice and policy."

Baird, whose background is in psychology, switched to a career in assessment in 1995 when she became senior research officer with the Associated Examinations Board (AEB), since merged into the Assessment and Qualifications Alliance (AQA). "The whole thing brought together my interests in education, psychology, research and policy." Last May, she was appointed reader in educational assessment at the University of Bristol, where she directs the programme for doctorates in education. The position gives her far more scope to conduct the research she feels to be so important following a five-year stint as head of research with the AQA. Her latest research, commissioned by the Qualifications and Curriculum Authority (QCA), explores how exam standards can best be compared across subjects and over time as well as between awarding bodies. She concludes that in England there are no easy, straightforward answers to these questions that, she says, create "a quagmire" owing to the competing demands on the exam system. "Standards are currently being used to give feedback to students and schools and also to monitor the education system and how it is performing. There is no experimental design that you could construct that could give an unequivocal answer to the question about whether standards are being maintained over time.

"The focus is therefore on the yearon- year comparability of standards;that's being done in a rigorous manner but what's needed is a more coherent programme so that we can stand back and ask whether the assessments we have meet their various purposes."

Baird feels one area that is not tackled well is the comparability between subjects. "Certainly, more work could be done there," she says. "People are saying that languages are too difficult; we need to have a more careful programme of research looking at what the issues are. Typically, people do quantitative or qualitative studies but don't bring the two together. We need to know whether children's performances in the examinations have changed, as well as whether their grades are different, in order to properly judge changes in standards." Despite the intense media focus on whether standards are in decline, Baird argues that much more could be done to communicate what exams actually do show.

"Some people have argued that the awarding bodies and regulators have not done enough to engage the media. They

"If you talk to anybody from a taxi driver to a hairdresser, although they might have some mistaken views about what goes on in the exam system, they do understand that it's sometimes like trying to compare apples with oranges.

Her experience of introducing the Curriculum 2000 reform of A-level exams at the AQA demonstrates what can be achieved through careful explanation of the results to the media. "My department was at the centre of explaining why students had done better in the new exams; we spotted the issue and looked for the explanation. When the first results were released in 2002, they were largely accepted."

In AQA French, for example, more than 28 per cent of candidates achieved Grade A in 2002 compared with less than 24 per cent in 2000; the proportion achieving a Grade E and above jumped by more than 8 per cent. What became apparent was the way in which the introduction of AS certificates in the first year had weeded out many weaker students because they were more likely to drop the subject in which they achieved their worst grade.

Exam standards

Baird argues in her research for QCA that no exam system can fulfil all of the different roles expected of it, exploring different ways of defining exam standards and highlighting the potential problems with each one. Cohort referencing, where the same percentage of candidates are allocated different grades for each exam, benefits from being clear-cut and easily understood.

"But you don't really know what students have had to do to get the grade," she explains. "If you have a hard subject compared with an easy one, you still end up with the same grade profile." With the catch-all definition of comparability, grades in different subjects are calculated by making adjustments for known student characteristics that relate to exam performance. Baird argues that the characteristics to include will always be contested. Why should hours of study be included, for example, if it becomes clear that for each hour of studying, candidates in maths do better than in physical education?

With the criterion referencing method, Baird says examiners are expected to judge the quality of students' work against a set of written standards. "But psychology research shows that human judges are fallible across all sorts of settings." To underline the potential difficulties they face, she points to the criteria used to gauge exceptional performance at English. These include "writing has shape and impact and shows control of a range of styles maintaining the interest of the reader throughout" and "a variety of grammatical constructions and punctuation is used accurately and appropriately and with sensitivity". She says most people are surprised to learn that such criteria are applied to Key Stage 1 for sevenyear- olds. "The written criteria alone do not tell judges what the standards should be - they have to interpret them."

Baird's preferred option is weak criterion referencing, where senior examiners reach a collective decision about how grades are awarded based on a combination of criterion referencing coupled and their assessment of how difficult each exam is.

"What I am saying is that we should be actively monitoring the comparability in exams and trying to measure the differences that exist between them. It's not simply a matter of using statistics or getting experts to make judgments. There are lots of competing sources of information when trying to draw a conclusion about whether an exam standard is appropriate."

She commends the diligence that is displayed during the standard-setting process. "Quite often, it will be observed by Ofsted, a teacher body or a representative from the press. I don't think I've ever seen a negative article about it."

Although weak criterion referencing is inevitably complicated, she argues that more straightforward approaches fail to meet all the requirements that society places on exam systems. When AEB merged with another exam board, for example, more emphasis had to be placed on comparability between boards than with previous year's exam results. "If you're introducing a new type of qualification, you need to keep a close eye on the comparability of that qualification and others," she explains. "My argument is that you have to make the best judgment you can about these things."

Testing reaction

Baird believes that research can have a significant impact on students in other areas apart from comparability. She is particularly concerned about how students will react to the introduction of single-level tests later this year, suggesting it is one of several examples where the student's view is being ignored when it comes to assessment.

"I would have thought that most students would have to resit these exams at least once as they progress through school. It all rests on teacher judgments about when to put them into these exams and there is a wealth of research to show that teacher's judgments about ability are not always consistent with test results.

"I think people can be too systems-focused rather than looking at the system from the perspective of the individual going through it. Students may be disenfranchised as a result and we're already hearing this complaint from students going through the education system at the moment. Teaching to the test could make them feel they are going through an examination sausage factory rather than learning broadly about a subject."

Baird warns that because the single-level tests are based on ability rather than age, they are completely different from existing exams and so need to be treated differently. "I think an awful lot more effort needs to go into the design of these tests and the setting of the standards."

Despite continuing government attempts to foster Assessment for Learning principles, she says the evidence from Ofsted and other bodies is that teachers lack the skills to adopt them effectively. "It is the wider principles they need to know about, not just how to mark exams in a given year."

Baird points out that assessment used linked to progress in the profession. "It is difficult to see what the incentive is to improve, other than professional development and a genuine interest in the subject. One way round this is to have assessment as an important part of the requirements for teachers to complete during their training."

At last month's CIEA conference, Baird argued that it is in the interests of awarding bodies, when they design new exams they hope will be popular, to demand few adjustments to teaching methods. "Teachers have got textbooks and working materials already prepared and there is enough change in their working lives as it is." But she believes this tendency is unhealthy for the education system. "It's stifling because you could end up with the same teaching to the test year after year. You get the best teaching when people are thinking creatively about the subject and getting students to think creatively about the subject as well."

Will diplomas woo employers?

This year's launch of specialised diplomas in England highlights a concern about student disenfranchisement that is close to Baird's heart. Too often, she says, the exam system simply serves to demoralise those students who do not do well. In March 2007, she completed research commissioned by the QCA into the grading system for the diplomas in 14 employment sectors, which combine theoretical and skills' learning. The principles that she recommended are mostly being adopted but she remains concerned that some students will end up with nothing meaningful to show potential employers. This is because they need to meet certain standards in ICT, numeracy and literacy to fully qualify. Although there is a standalone qualification covering the principal learning element that is specific to each diploma, Baird questions whether this will gain currency in the labour market. "To be honest, just passing the principal learning qualification and not gaining the full diploma is going to be the most likely result for a lot of students." She accepts that diplomas are not intended to be vocational courses for the less academically minded but says they will, nonetheless, attract such students. Baird warns that studying for two years and failing to come away with an overarching qualification could deter students from