How much better are translation programs with AI leaps? The changes are truly revolutionary, but only a few major languages benefit from them, and many thousands are threatened with extinction. In this regard, Hungarian is one of the greats.
“The bear came down the mountain. There was a road and he passed it.” Or: “The bear came down the mountain. There was a roller on the road and he passed it.” Older machine learning language models had little to do with sentences like this, which barely differed from each other (exactly who passed whom or what?). Newer versions, on the other hand, no longer have trouble sensing how the third-person singular pronoun “inherited” from the previous sentence, and who the pronoun refers to—as mathematician and linguist Gábor Prószky, director general of the Center for Linguistics Research, pointed out in his presentation at the general meeting of the Hungarian Academy of Sciences a few weeks ago.
Translation programs based on large language models (LLM) do not have this new possibility alone. According to experts, revolutionary changes have taken place in the field of machine language understanding and translation in the last two or three years. The difference becomes especially clear when compared to previous attempts that lasted about 70 years.
Initially, over the decades, linguists tried to feed grammar rules into a form that a computer could understand, but it turned out that this becomes incomprehensible above a certain level and, moreover, leads to less realistic results. In particular, rule-based translators cannot handle ambiguity. Since the 1990s, they have started to turn to statistical linguistics methods and give computers a large amount of human-translated texts to use in translation, which is likely to be based on their frequency. This made it easier to recognize associations of words or parts of sentences, but this system could not take into account the context of the text either, and larger text databases gave inconsistent results, and it could not handle very rare word combinations and very long sentences – explains Gabor Proszczyki.