Mass. ponders hiring a computer to grade mcas essays. what could go wrong – the boston globe p gasol

A growing body of research has suggested little variation between scores issued by computer programs and human beings. In some cases, that research has been conducted in conjunction with Pearson and other companies that have automated scoring products.

“When used responsibly and carefully, automated essay scoring is faster and cheaper and can even exceed the validity of human scores,” said Jon Cohen, executive vice president at American Institutes for Research and president of AIR Assessment, which provides automated essay scoring in standardized testing. “One of the big challenges in human essay scoring is to get [written responses] scored reliably.”

That’s because there is an element of subjectivity in judging writing — what appeals to one person might not appeal as strongly to another. Cohen said testing companies remedy this by training people who score written responses to adhere strongly to a set of detailed standards for each scoring category.

But automated scoring has drawn many vocal detractors. Les Perelman, a researcher and retired Massachusetts Institute of Technology writing professor, argues that automated scoring systems are not only inaccurate but are detrimental to writing instruction. In an era of teaching to the test, he said, teachers will drill strategies into their students to game the computer programs to get higher scores.

To test just how unreliable automated scoring systems can be, Perelman and a small group of MIT and Harvard graduate students four years ago generated an essay of gibberish that included obscure words and complex sentences and ran it through a computerized scoring system used for a graduate school admission exam. The nonsensical essay achieved a high score — on the first try.

Earlier this year, Ohio education officials faced a public backlash after they revealed they had quietly implemented automated scoring for student writing on standardized tests. The issue came to light after several districts spotted irregularities in the results, according to media reports.

Fully aware of the skepticism surrounding the technology, Utah treaded carefully when it adopted automated scoring as part of a new standardized testing system during the 2014-15 school year, using hand scorers to verify the results for the first two years.

“Overall, I’ve been impressed with how closely our automatic scoring matches up with teacher scoring,” said Cydnee Carter, Utah’s assessment development coordinator, noting that about 20 percent of the essays continue to be reviewed by people.

There have been some glitches, however, she said. Officials had to readjust the program to spot gibberish as well as students who were quoting too heavily from reading passages they were writing about, both of which were artificially inflating scores.

Massachusetts officials have been toying with the idea of automated scoring of essays since at least 2016. A work group examining changes to the MCAS exams, which consisted of state administrators and local educators, was split on the idea and recommended sharing information with local school systems about it, according to a summary of their discussion.