
From the Magnetic Drum to the Babylonian Fish: How Machine Translation Began
In November 1966, the Advisory Committee on Automatic Language Processing (ALPAC) at the U.S. government published a report stating the economic infeasibility of research into machine translation. This document could have ended an AI era that had not yet begun at the time, if not for the enthusiasm of individual scientists working in the field who ultimately laid the groundwork for RBMT, and later for NLP. Read about how this happened in an excerpt from Jana Khlyustova’s book Catch the Babylonian Fish. The Human Brain, Neural Networks and the Study of Foreign Languages.
The ALPAC report marked the nadir of funding and interest in machine translation since the start of active research some fifteen years earlier. This pessimism affected mainly the United States, although, as noted in the previous chapter, several projects continued there. In Canada, France, Germany and other countries, research did not cease. Nevertheless the main advances of the 1960s–1970s were tied to the United States, namely to the SYSTRAN and Logos projects. They signalled a qualitative leap in machine translation: the creators of the new systems developed an approach to it based on rules. Let us examine what this is.
SYSTRAN: translation for the world
The company SYSTRAN (its name is an acronym for System Translation, “systemic translation”) was founded in 1968 by linguist and polyglot Peter Toma. The roots of this company go much deeper, both chronologically and ideologically. Probably no one can tell the story of SYSTRAN better than Peter Toma himself. In 1986 he wrote:
«Today, in the nuclear age, we hear more and more about how important peace is, and that humanity needs saving. No doubt, mutual understanding contributes to peace, and overcoming language barriers helps achieve that mutual understanding. During World War II I witnessed how language barriers hindered the achievement of peace. At the end of the war it was clear that we had entered an era of possessing sophisticated weapons of mass destruction, and I felt more than ever that I must devote all my efforts to eliminating the factors that provoke conflict».
Toma, who at the time spoke English, German and Hungarian, suspected that with rising East–West tensions the role of Russian would grow, and he regarded learning it as important. And such a chance presented itself.
«In Munich I met a refugee from Russia, Professor Wilpert (his ancestors came to Russia during Catherine the Great’s reign), — he recalls. — He had a player and records in Russian. I borrowed his player for many weekends, went with him to the mountains and learned Russian all day on Saturdays and Sundays. <…> I listened to those records so often that they wore out, and Professor Wilpert could no longer use them for his Russian lessons. Our friendship suffered from this, and we no longer kept in touch until 1956 — when I happened to meet Professor Wilpert on the street in Los Angeles. Then, in 1961–1963, he worked for me as I launched automated translation systems Autotran and Technotran — predecessors of SYSTRAN».
After 1945 Peter Toma studied international relations and social sciences and eventually became interested in economics — so much so that he even began to work in that field. After a few years he realised this did not bring him closer to the global goal of preventing conflicts and wars, and he joined the California Institute of Technology. It was roughly then that the first computer appeared here — the Datatron 205.
«After getting acquainted with the logical operations that this machine could perform, I was fascinated by the obvious potential for applying them to automatic translation. I had a normal daytime job, but I understood that for preparing and testing algorithms I would need a lot of time to work with the computer», — recalls Peter Toma.
One unexpected factor helped him: the Datatron 205’s memory device, like other early computers, was a magnetic drum. This large, fast-rotating metal cylinder was coated with a thin ferromagnetic layer. The drum was a delicate device: it had to be turned off at night and started in the morning, and operators frequently had problems starting it. Toma proposed to management a scheme: he would monitor the drum all night and fix any faults. In return, he would be allowed to use the computer at night to test and debug his translation programs. Management agreed, and in 1956 the groundwork for SYSTRAN was laid.
«Of course, such arrangements demanded non-standard working hours, — Toma recalls. — A typical day looked like this: I did my main job from 8 a.m. to 4:30 p.m. with a short lunch break. At 5 p.m. I dined, then slept between 6 and 10:30 p.m. From 11 p.m. to 7 a.m. I worked with the computer, had breakfast, showered, and returned to my job by 8 a.m. This pattern persisted for many months. <…> Many of the algorithms I created and tested during those long nights still run in SYSTRAN today, although before I dedicated myself exclusively to this system I created several other working MT systems: Serna in Georgetown, then Autotran and Technotran. SYSTRAN effectively took its birth on an IBM 360 computer in 1963–1964.
Peter Toma believed the ALPAC committee had chosen its moment carefully: «The dates were carefully chosen, and the hearing was scheduled for days when I was in Europe. The ALPAC report was a devastating blow to machine translation, especially in the United States».
Yet, in any case, Tomа’s work managed to attract the attention of specialists. As early as 1965 the German Research Society invited the scientist to a meeting with Germany’s leading linguists. Experts agreed that SYSTRAN used the right approach, distinct from earlier attempts at piece-by-piece translation of texts. The result was a contract for further development of the system.
In creating SYSTRAN, Peter Toma adhered to the principles that underpin today’s rule-based machine translation. This approach is typically called classic. In its implementation, the systems extract linguistic information about source and target languages from a range of dictionaries and grammars; they cover semantic, morphological and syntactic regularities of each language.
The first stage of translating text in SYSTRAN was a morphological analysis of words and their lookup in dictionaries of various types. The second stage — analysis of sentences: syntax, lexicon, semantics. And only at the third stage was the actual translation performed: the system synthesized the information obtained earlier and constructed a sentence taking into account the grammar of the target language.
In 1968 Toma founded the eponymous company, and in 1969 SYSTRAN took part in the U.S. Air Force competition, which announced a tender — a system for automatic translation of texts from Russian. Besides SYSTRAN, IBM and Bunker Ramo Corporation submitted proposals. The young Peter Toma’s company won and signed its first contract.
After that SYSTRAN developed rapidly. In 1974 the system was used by NASA in the Apollo–Soyuz program to translate technical documentation from English to Russian. In 1975 a contract was signed with the European Commission to work with several pairs of European languages (by the way, the European Commission still uses SYSTRAN today). Later the system was adopted by Xerox and Seiko. In 1995 a version of the MT software was released that enterprises could use on their own on the Windows operating system. In 1997 SYSTRAN, together with Digital Equipment Corporation, launched the world’s first online service for translating web pages called BabelFish (“Babelfish” — in honour of the creature from Douglas Adams’s The Hitchhiker’s Guide to the Galaxy). The fish could be inserted into the ear, and it translated the translation from any language directly into the brain of its owner).
Published in: Jana Khlyustova. Catch the Babylonian Fish: The Human Brain, Neural Networks and the Study of Foreign Languages. Moscow: Alpina Non-Fiction, 2024.
Рассылки ForkLog: держите руку на пульсе биткоин-индустрии!