Automated assessment

28 Jun 2012

In a number of countries, efforts are being made to provide a common school curriculum, writes Gerry White in DERN.

For example, in the USA, a partnership between states has developed the Common Core Standardswhich has been adopted by a large number of states. In Australia, development of a nationally mandated school curriculum (51) is under development with a literacy and numeracy assessment program already in place. However, curriculum needs to be coupled with assessment because both are integral to education programs.

An excellent coverage of the issues in assessment can be seen at the Assessment and Teaching of 21st Century Skills program’s website where a number of seminal white papers that ‘explore the meaning of 21st century skills and the importance of classroom assessment’ are available. As standardisation of curriculum gains momentum, the potential for automated scoring of assessments may be expanded by education authorities and education vendors.

A comparison of essay scoring capability of nine automated essay scoring (AES) engines has recently been made available. The report Contrasting State-of-the-Art Automated Scoring of Essays: Analysis (36) was presented to the US National Council on Measurement in Education (NCME) in April, 2012. This report provides a very useful comparison of the performance of each of nine commercial AES engines. The comparison was based on a large sample of essays which were scored as part of the study.

Essay scoring engines or measurement technology 'can produce reliable and valid scores [when compared with human raters]’ (p. 4). This study reports on automated scoring of samples of essays drawn from over 20,000 students who completed eight essay writing exercises in years 7, 8 and 10 from six states. The samples, which ranged between 1527 to 3006 data sets, were partitioned into three sets: one for training, one for tests and the third for validations. The AES engines were trained and human markers validated the score results.

The AES engines were able to score writing features such as spelling, grammar, usage, mechanics, style, organisation, lexical complexity, content relevance, ideas, structure, semantics, vocabulary, fluency, word choice, conventions and presentation. Although human markers are trained by experts to use a variety of rubrics for scoring essays, there is no single agreed marking standard. The essays in the study were either source-based prompted by source material or traditional writing genres such as narrative, descriptive, persuasive and so on. In the same way that assessments employ a variety of rubrics for marking, so too do AES engines for scoring. Not all engines scored all writing features because each soring engine had a special focus.

The results overall demonstrated that ‘automated essay scoring was capable of producing scores similar to human scores for extended-response writing items with equal performance for both source-based and traditional writing genre’ (p. 2). The report concluded that, ‘As a general scoring approach, automated essay scoring appears to have developed to the point where it can be reliably applied in both low-stake assessment (eg instructional evaluation of essays) and perhaps as a second scorer for high-stakes testing’ (p. 27). However, there are some major issues that do need to be further considered before automated scoring can become operational. Issues such as assessment construct validity, test manipulation, teaching to the test to maximise scores and equity of treatment for subgroups as a result of the scoring methodology, are some of the major issues yet to be researched and resolved.

Automated essay scoring is a complex new area of educational endeavour. Contrasting State-of-the-Art Automated Scoring of Essays: Analysis (36) makes a very useful contribution, especially as it compares nine different scoring engines with comments and highlights of their different features. Curriculum leaders and teacher educators will find this report particularly helpful.



Gerry White is Principal Research Fellow: Teaching & Learning using Digital Technologies, Australian Council for Educational Research

This article was first published on the Digital Education Research Network (DERN)

Read the full article on DERN (free registration required)

Photo: Flickr / cindiann


Publication Details
Published year only: 
Subject Areas
Geographic Coverage