Machine Learning Model Translates Akkadian Cuneiform Into English

Category Machine Learning

tldr #

A new machine learning model has been developed that can automatically translate Akkadian cuneiform into English with a BLEU 4 score of 37.47. It is effective with shorter sentences, however inaccuracies sometimes occur. As a result of this, the authors recommend using it as a part of a collaboration between human and machine translation.

content #

An AI model has been developed to automatically translate Akkadian text written in cuneiform into English.

Hundreds of thousands of clay tablets from ancient Mesopotamia, written in cuneiform and dating back as far as 3,400 BCE, have been found by archaeologists, far more than could easily be translated by the limited number of experts who can read them.

Shai Gordin and colleagues present a new machine learning model that can automatically translate Akkadian cuneiform into English.

The clay tablets found in Mesopotamia were written over 4 thousand years ago

Two versions of the model were trained. The research is published in the journal PNAS Nexus.

One version translates the Akkadian from representations of the cuneiform signs in Latin script (transliterations). Another version of the model translates from unicode representations of the cuneiform signs.

The first version, using Latin transliteration, gave more satisfactory results in this study, achieving a score of 37.47 in the Best Bilingual Evaluation Understudy 4 (BLEU4), a test of the level of correspondence between machine and human translation of the same text.

The Akkadian language was spoken and written in Mesopotamia for 2500 years

The program is most effective when translating sentences of 118 or fewer characters. In some of the sentences, the program produced "hallucinations"—output that was syntactically correct in English but not accurate to the Akkadian meaning.

But in the majority of cases, the translation would be usable as a first-pass at the text. The authors propose that machine translation can be used as part of a "human-machine collaboration," in which human scholars correct and refine the models' output.

The Akkadian script is called Cuneiform, which was made up of different scrupts

hashtags #
worddensity #