WS08Paper:Tracking Changes

The primary difference between collaborative translation and traditional translation is that the changes are not controlled. Authors contributing to the documents are not coordinated and are likely not to know each other. Documents are in constant evolution and will never reach a final state. Moreover, in a wiki community, contributions must be allowed in all available languages.

These key differences make all structures used by the traditional translation industry collapse. The top-down approach with staged document approval and translation requests cannot hold. Maintaining a master version is completely out of the question.

To bring collaboration to the translation processes, the tools supporting them must respect the openness allowing wikis to be such powerfull collaboration tools. The freedom to contribute to any page must remain. Change should be embraced rather than constrained. While the translation effort is desired, the availability of multiple languages on the website should not restrain the original content creation.

The different communities contributing to the content of the various linguistic versions must be seen as a whole who can benefit from each other. The current inability to synchronize efforts causes silos to be created. The tools must allow synergy to occur between the communities and help them share knowledge.

To do this properly, the contributions made to the different pages must be tracked properly. With limited resources on some translation pairs, it would be nearly impossible to replicate changes between some languages. To have any chance of success, working around the limited language pairs is necessary.

Breaking the language pair barrier means that content should be available from as many sources as possible to provide translators with decent alternatives and reduce the amount of efforts they will have to provide in order to understand the changes made. To machines, all linguistic versions are distinct pieces of data and they wouldn't know how to map the change in a different context. However, to humans, the idea behind a change is present in the content of any page it has been translated to. Any page that incorporated the idea can be used a source for the given change.

Without tool support, this kind of effort would be possible on small amounts of languages with a lot of discipline by the translators and appropriate use of change comments. However, the error rates would be very high and some changes would be lost along the way. Tool support embedded into the wiki engine could allow to support translators in their efforts and reduce the error rates.

In adding the required features to support change tracking, a few guidelines had to be followed:
  • Any contribution is worth translating until a translator says otherwise.
  • Only the final result matters. Intermediate steps can be ignored.
  • Content contributors should not have to worry about translation.

These three guidelines are all tightly connected towards a solution. As mentioned before, an attempt to translate any change individually to every single linguistic version is impractical due to the limited translation resources. By focussing on the final results, it is possible for a translator to catch up on the content of a source page in a single step and abstract away the different steps that lead to the final content. In most cases, the translator can simply observe the changes that were made on the source page since the last time a synchronization occurred.

In a wiki content creation process, multiple edits and not all of them would be worth translating. Rather than having a single author, wiki pages are a collaboration work between many people. Someone would initially write the base content. Others would add to it. In the process, many people will make minor contributions that contribute to the quality of the content, but may not be relevant for translators. Such changes include grammatical corrections and syntactic improvements. Some changes may only affect the formatting of the page. Determining which change has to be translated is a complex task on which a line is very hard to draw.

A simple change to the syntax of a phrase may seem trivial, but if the previous formulation was ambiguous, a translator may already have made a wrong interpretation of it. In which case, the translation would need to be updated. A potential solution to this would be to ask the content contributor to say wether or not a change should be translated. However, the content contributor may not have sufficient knowledge about the translation process to make the right decision. With no information requested from the content contributors, it is possible to make the translation process as invisible as possible.

Because translators will translate an aggregate of changes rather than changes individually, it does not matter if a few trivial changes slip in. Very active translation communities propagating the changes often may get to translate very small changes that do not impact their version of the content. However, it is unlikely that a translation gets updated multiple times a day to match each change on a page.

The result of these guidelines is a very simplified model for change tracking. In the model, each change represents an idea by its author. As far as tracking in concerned, all changes are equal as they all have to be propagated to the other linguistic version. When a change is incorporated in a given linguistic version, all pages updating from the given page will also inherit the given change.

In a simplified manner, change propagation can be represented as a directed graph in which nodes are page versions and arcs are page evolutions. Arcs are used for both evolution between versions of a same page and to represent the translation of content from source to target language. In the graph, original content creation is attached to page versions. By following all arcs from the original page version node, all pages containing the change can be found.

Consider this sample case using three languages. An unlimited amount of languages are supported. However, the resulting scenarios and representations would be too large and impractical for the demonstration purposes.

  1. A page gets created in English {en_v1}.
  2. In a second edit, some content is added {en_v2}. A third edit is then made.
  3. After this point, the page gets an original translation to both French {fr_v1} and Spanish {es_v1}.
  4. Afterwards, a French contributor decides to add the list of required sections {fr_v2}. After this modification, both English and Spanish versions indicate that they are not fully up to date {en_v3_2}. They provide links to view the other versions and update the content.
  5. An English translator responds to the request and includes the participant list in the English version {fr_v2_to_en_v4}. The English page correctly indicates that the page is now up to date, but the Spanish version is still behind {en_v4}.
  6. A Spanish contributor adds the exact dates of the event along with the location {es_v2}. The page indicates that the page does contain additional content. However, more content can be obtained from the French and English versions.
  7. A Spanish translator decides to update the Spanish version from the English source. In doing so, the Spanish version becomes fully up to date and includes the changes first made to the French version {es_v3}. Both the French and English versions now indicate that content can be obtained from the Spanish version.

Image

Image

Image

Image

Image

Image

Image

Image

Image


As it can be seen in the scenario, to a content contributor, the translation process is invisible. As any visitor of the website, the contributor will see the "Page Translation" box presenting the different alternatives and status information. However, he is free to ignore it. When a change is made by a content contributor, a new original content contribution is recorded and other linguistic versions of the page get updated with the information.

The "Page Translation" box provides links for translators to view the relevant changes made to the page. When using those links, the translator is brought to a slightly different version of the edit page. The page displays the changes to be translated along with the text area. When the translator indicates that the translation of the changes is completed, the translation target gets marked as containing the changes provided by the translation source. Again, other linguistic versions of the page get updated with the new information.

The directed graph representation of the described scenario can be illustrated as in figure {architecture_graph.dot.png}. In the graph, white nodes are original content contributions and gray nodes are versions resulting from a translation efforts. Solid arcs are page evolutions from version to version and dashed arcs are translations from source to target. On each node, the original content contributions included in the version are listed on the second row.

The graph representation is in fact very close to the internal representation used. Beyond providing useful information for the site visitors and support translators, the entire translation history is preserved. Figure {en_history} presents the page history of the English page in the scenario. The information from the translation history will allow to analyze the translation patterns and evolution of the communities around the different linguistic versions of a page.

en_history
en_history


We suspect communities will structure themselves to reduce the translation effort required and adapt to the changes in time. For example, a pivot language structure may be used by a community. The pivot language is a structure in which all changes first get translated to a language before being propagated to the other ones. In theory, every change will need a translation to reach other languages. However, if many changes are made to different linguistic versions, if they first all get incorporated to a pivot language, the translation effort required is much lower because only one translation effort from the pivot language to the other linguistic versions is required to fully propagate the changes. Over long periods of time, the pivot language may change depending on which community is the most active. Websites maintaining a very large amount of linguistic versions may see multiple pivots appear to better support similar languages and various other social aspects. Such a model also reduces the average length of the translation paths. Shorter paths mean less variation from the source and more accurate translation.

The data collected will allow to verify if those patterns exist. By correlating with other sources on information, it will be possible to study the patterns and identify which ones are the most efficient and best serve the content quality. With sufficient evidence, it will be possible to adapt the tools or provide guidelines to encourage the positive patterns and help less structured communities to reduce the amount of effort required.

On a short term scale, the data can be used to asses the health of the translation community. The data can easily identify the missing pairs and show where the bottlenecks are in the translation process. With the information, it may be possible in some cases to deploy additional resources to resolve the issue or to find a way to motivate the community.

Upcoming Events

No records to display