The Problem
The primary difference between collaborative translation and traditional translation is that the is much less controlled. Below are a list of constraints and assumptions which are assumed in the traditional workflow, but need to be relaxed in a collaborative context.
- Master language: In a traditional context, original content is typically created in a master language, usually English. This is not realistic in a collaborative context, because authors may not all be fluent enough in English to write high quality content in that language.
- Limit original changes during translation: In a traditional context, once translation of content in the master language has started, there is a strong urge to limit changes to the master language version until it has been translated to all other languages. This is not realistic for a collaborative context since content often never reaches a final stage.
- Can ensure timely translation: In a traditional context, one can assume that timely translation of content can be ensured through contractual obligations with the translator. In a collaborative context, this is not possible since translators are often unpaid volunteers working on their own free time.
- Focus on small list of languages: In a traditional context, there is a tendancy to focus on a small list of "core" languages, in order to minimize the number of language pairs for translation. In a collaborative contetxt, members of the community are usually allowed to create content in whatever language, including minority languages.
- Strong coordination of authors and translators: In a traditional context, the community of authors and translators is a "closed" world, where everyone knows each other, and there is some central authority that coordinates everything. In a collaborative context, authors and translators contributing to the documents are not coordinated and often do not know each other.
TODO: Eliminate or merge sentences which are redundant in the argumentation below.
Thus the main challenge of collaborative translation is to remove as many of the constraints imposed by those traditional tools and process, while still allowing the community of authors and translator to produce and translate content in an effective manner. In other words, change must be embraced rather than constrained. At the same time, the system must have sufficient structure and technological support to allow different linguistic communities to collaborate synergestically without having to rewrite content from scratch in evey language (as is typically done in Wikipedia, where the current inability to synchronize between languages causes linguistic silos to be created). The tools must allow synergy to occur between the communities and help them share knowledge. It must do this in a way that minimizes cognitive load on the users, and minimizes the chance of human errors (ex: content in one language not being translated in another language). Also, support of collaborative translation must not complicate and restrain the process of original content creation which has made the success of many wiki communities.
The LizzyWiki system described in Désilets et al. (2006) removed many of these constraints, while allowing synergy in a bilingual context. But it still fell short of allowing users to author or translation any page at any point in time. For example, if an author wanted to make an original change two words to a French page which was out of date with its English counterpart, he had to first bring the French page up to date with the English version. This often turned out to be a problem and stop the author dead in his tracks. For example, if the French author did not know how to read English, he would not be able to do his original modification before soemone else updated the French page based on the English. In other cases, the author could translate from English to French, but would have to translate several sentences, before he would be allowed to do his two word original contribution to the French page.
In the CLWE project, we move even further in the direction of unconstrained collaborative workflow. Essentially, any page can be modified or translated from any language, at any time. The only constraint left is that users must not perform original edits in the course of carrying out a translation transaction.
TODO: I (AD) don't understand what LPH is saying here. LPH, can you clarify?
With limited resources on some translation pairs, it would be nearly impossible to replicate changes between some languages. To have any chance of success, working around the limited language pairs is necessary.
TODO: I (AD) don't understand what LPH is saying here. LPH, can you clarify?
Breaking the language pair barrier means that content should be available from as many sources as possible to provide translators with decent alternatives and reduce the amount of efforts they will have to provide in order to understand the changes made. To machines, all linguistic versions are distinct pieces of data and they wouldn't know how to map the change in a different context. However, to humans, the idea behind a change is present in the content of any page it has been translated to. Any page that incorporated the idea can be used a source for the given change.
AD: I think this is mentioned above albeit in less detail Without tool support, this kind of effort would be possible on small amounts of languages with a lot of discipline by the translators and appropriate use of change comments. However, the error rates would be very high and some changes would be lost along the way. Tool support embedded into the wiki engine could allow to support translators in their efforts and reduce the error rates.
A key technical challenge that must addressed to support such an unconstrained translation and editing workflow, is how to track the original contributions and translations made in the different pages, in such a manner that they can easily be reproduced in other linguistic versions of those same pages. We now describe how we addressed this challenge in the CLWE project.
Tracking and Translating Edits
TODO: I find this part very hard to follow. LPH, maybe you need to verbally explain what you mean to either Seb or I. I often find this helps me formulate my thoughts in a way that is easier to understand by other people
In adding the required features to support change tracking, a few guidelines had to be followed:
- Any contribution is worth translating until a translator says otherwise.
- Only the final result matters. Intermediate steps can be ignored.
- Content contributors should not have to worry about translation.
These three guidelines are all tightly connected towards a solution. As mentioned before, an attempt to translate any change individually to every single linguistic version is impractical due to the limited translation resources. By focussing on the final results, it is possible for a translator to catch up on the content of a source page in a single step and abstract away the different steps that lead to the final content. In most cases, the translator can simply observe the changes that were made on the source page since the last time a synchronization occurred.
In a wiki content creation process, multiple edits and not all of them would be worth translating. Rather than having a single author, wiki pages are a collaboration work between many people. Someone would initially write the base content. Others would add to it. In the process, many people will make minor contributions that contribute to the quality of the content, but may not be relevant for translators. Such changes include grammatical corrections and syntactic improvements. Some changes may only affect the formatting of the page. Determining which change has to be translated is a complex task on which a line is very hard to draw.
A simple change to the syntax of a phrase may seem trivial, but if the previous formulation was ambiguous, a translator may already have made a wrong interpretation of it. In which case, the translation would need to be updated. A potential solution to this would be to ask the content contributor to say wether or not a change should be translated. However, the content contributor may not have sufficient knowledge about the translation process to make the right decision. With no information requested from the content contributors, it is possible to make the translation process as invisible as possible.
Because translators will translate an aggregate of changes rather than changes individually, it does not matter if a few trivial changes slip in. Very active translation communities propagating the changes often may get to translate very small changes that do not impact their version of the content. However, it is unlikely that a translation gets updated multiple times a day to match each change on a page.
The result of these guidelines is a very simplified model for change tracking. In the model, each change represents an idea by its author. As far as tracking in concerned, all changes are equal as they all have to be propagated to the other linguistic version. When a change is incorporated in a given linguistic version, all pages updating from the given page will also inherit the given change.
In a simplified manner, change propagation can be represented as a directed graph in which nodes are page versions and arcs are page evolutions. Arcs are used for both evolution between versions of a same page and to represent the translation of content from source to target language. In the graph, original content creation is attached to page versions. By following all arcs from the original page version node, all pages containing the change can be found.
Consider this sample case using three languages. An unlimited amount of languages are supported. However, the resulting scenarios and representations would be too large and impractical for the demonstration purposes.
- A page gets created in English {en_v1}.
- In a second edit, some content is added {en_v2}. A third edit is then made.
- After this point, the page gets an original translation to both French {fr_v1} and Spanish {es_v1}.
- Afterwards, a French contributor decides to add the list of required sections {fr_v2}. After this modification, both English and Spanish versions indicate that they are not fully up to date {en_v3_2}. They provide links to view the other versions and update the content.
- An English translator responds to the request and includes the participant list in the English version {fr_v2_to_en_v4}. The English page correctly indicates that the page is now up to date, but the Spanish version is still behind {en_v4}.
- A Spanish contributor adds the exact dates of the event along with the location {es_v2}. The page indicates that the page does contain additional content. However, more content can be obtained from the French and English versions.
- A Spanish translator decides to update the Spanish version from the English source. In doing so, the Spanish version becomes fully up to date and includes the changes first made to the French version {es_v3}. Both the French and English versions now indicate that content can be obtained from the Spanish version.
As it can be seen in the scenario, to a content contributor, the translation process is invisible. As any visitor of the website, the contributor will see the "Page Translation" box presenting the different alternatives and status information. However, he is free to ignore it. When a change is made by a content contributor, a new original content contribution is recorded and other linguistic versions of the page get updated with the information.
The "Page Translation" box provides links for translators to view the relevant changes made to the page. When using those links, the translator is brought to a slightly different version of the edit page. The page displays the changes to be translated along with the text area. When the translator indicates that the translation of the changes is completed, the translation target gets marked as containing the changes provided by the translation source. Again, other linguistic versions of the page get updated with the new information.
The directed graph representation of the described scenario can be illustrated as in figure {architecture_graph.dot.png}. In the graph, white nodes are original content contributions and gray nodes are versions resulting from a translation efforts. Solid arcs are page evolutions from version to version and dashed arcs are translations from source to target. On each node, the original content contributions included in the version are listed on the second row.
The graph representation is in fact very close to the internal representation used. Beyond providing useful information for the site visitors and support translators, the entire translation history is preserved. Figure {en_history} presents the page history of the English page in the scenario. The information from the translation history will allow to analyze the translation patterns and evolution of the communities around the different linguistic versions of a page.
Maybe not really relevant
Maybe all we need is to reformat LPH's scenario a bit to use the format below, which communicates better than the impersonal approach. Also, add a bit about how prior art (LizzyWiki) succeeded in removing some of the constraints, but not all.
John creates an English page
Welcome to this wiki.
Pierre then translates it to French page
Bienvenue à ce wiki. Later on,
John adds three sentences to English page
Welcome to this wiki. Now,
Josée who does not speak English wants also to add two words to the French page
Bienvenue à ce wiki.
With a standard wiki, the two pages would be distinct and no one would be aware of the content evolution in the other linguistic versions. With the LizzyWiki approach,
Josée is not allowed to add her two words to
Bienvenue à ce wiki before she has translated the ten sentences added by
John to the English version
Welcome to this wiki. But
Josée cannot do this because she does not read English. Even if she could, she might not be in the mood to translate ten English sentences just to be allowed to add two words to the French version.
In (Désilets et al., 2006) the authors also postulated that in order to support collaborative authoring and translation in more than two languages at a time, it might be necessary to impose the use of pivot languages as intermediaries between other languages, in order to provide stable points of references in an otherwise chaotic environment.
With the CLWE project, we blindly ignored these constraints and allow authors to create original content on any linguistic version of any page, and at any time.