History: WS08Paper:Introduction

Preview of version: 16

Massively collaborative sites like Wikipedia are revolutionizing the way we think and deal with content creation. This is bound to have profound impacts on the way that we think and deal with content translation as well (Desilets 2007... this would be my Aslib keynote).

In particular, it raises the question of how to go about collaboratively translating content that is collaboratively created and ever changing. This is a situation that many community-built sites find themselves in. For example, SUMO, the Mozilla Support community had been using traditional techniques for documentation until recently. When they decided to move to wiki documentation, they faced an important problem with content localization.

In order to provide quality documentation to a very large user base, the SUMO community requires the content to be translated in at least 8 major languages. To the Mozilla foundation, widespread adoption of the web browser is a critical business objective. Without quality documentation available to a large public, those objectives are impossible to reach.

TODO: Look at what Olaf said (transcribed on the "Message" page) and see how it can be threaded in here.

TODO: LPH wrote some excellent introductory French material in his project report. Use some of it here

Translating content in this sort of environment presents a number of unique challenges, compared to translation in more traditional environments (Desilets et al., 2005):

  • Impracticality of imposing a master language, and forcing all original contributions to be written first in that language.
  • Ever changing nature of the content which may never reach a "final" stage.
  • Difficulty of enforcing timely translation into all target languages.
  • Non-professional nature of many of the volunteers doing the translations.

The primary difference between collaborative translation and traditional translation is that the is much less controlled. Below are a list of constraints and assumptions which are assumed in the traditional workflow, but need to be relaxed in a collaborative context.

  • Master language: In a traditional context, original content is typically created in a master language, usually English. This is not realistic in a collaborative context, because authors may not all be fluent enough in English to write high quality content in that language.
  • Limit original changes during translation: In a traditional context, once translation of content in the master language has started, there is a strong urge to limit changes to the master language version until it has been translated to all other languages. This is not realistic for a collaborative context since content often never reaches a final stage.
  • Can ensure timely translation: In a traditional context, one can assume that timely translation of content can be ensured through contractual obligations with the translator. In a collaborative context, this is not possible since translators are often unpaid volunteers working on their own free time.
  • Focus on small list of languages: In a traditional context, there is a tendancy to focus on a small list of "core" languages, in order to minimize the number of language pairs for translation. In a collaborative contetxt, members of the community are usually allowed to create content in whatever language, including minority languages.
  • Strong coordination of authors and translators: In a traditional context, the community of authors and translators is a "closed" world, where everyone knows each other, and there is some central authority that coordinates everything. In a collaborative context, authors and translators contributing to the documents are not coordinated and often do not know each other.

TODO: Eliminate or merge sentences which are redundant in the argumentation below.

Thus the main challenge of collaborative translation is to remove as many of the constraints imposed by those traditional tools and process. In other words, must be embraced rather than constrained. At the same time, the tool and processes must have sufficient structure and technological support to allow the community of authors and translator to produce and translate content in an effective manner. For example, the tools and process must allow the various linguistic communities to work synergestically without having to rewrite content from scratch in evey language (as is typically done in Wikipedia, where the current inability to synchronize between languages causes linguistic silos to be created). The tools must also offer technological support to minimize cognitive load on the users, and minimize the chance of human errors (ex: content in one language not being translated in another language). Finally, support of collaborative translation must not complicate and restrain the process of original content creation which has made the success of many wiki communities.

The LizzyWiki system described in Désilets et al. (2006) removed many of these constraints, while allowing synergy in a bilingual context. But it still fell short of allowing users to author or translation any page at any point in time. For example, if an author wanted to make an original change two words to a French page which was out of date with its English counterpart, he had to first bring the French page up to date with the English version. This often turned out to be a problem and stop the author dead in his tracks. For example, if the French author did not know how to read English, he would not be able to do his original modification before soemone else updated the French page based on the English. In other cases, the author could translate from English to French, but would have to translate several sentences, before he would be allowed to do his two word original contribution to the French page.

In the CLWE project, we move even further in the direction of unconstrained collaborative workflow. Essentially, any page can be modified or translated from any language, at any time. The only constraint left is that users must not perform original edits in the course of carrying out a translation transaction.

TODO: I (AD) don't understand what LPH is saying here. LPH, can you clarify?
With limited resources on some translation pairs, it would be nearly impossible to replicate changes between some languages. To have any chance of success, working around the limited language pairs is necessary.

TODO: I (AD) don't understand what LPH is saying here. LPH, can you clarify?
Breaking the language pair barrier means that content should be available from as many sources as possible to provide translators with decent alternatives and reduce the amount of efforts they will have to provide in order to understand the changes made. To machines, all linguistic versions are distinct pieces of data and they wouldn't know how to map the change in a different context. However, to humans, the idea behind a change is present in the content of any page it has been translated to. Any page that incorporated the idea can be used a source for the given change.

AD: I think this is mentioned above albeit in less detail Without tool support, this kind of effort would be possible on small amounts of languages with a lot of discipline by the translators and appropriate use of change comments. However, the error rates would be very high and some changes would be lost along the way. Tool support embedded into the wiki engine could allow to support translators in their efforts and reduce the error rates.


TODO: (AD) Find a way to reference LPH's blog entries of 2004-06 somewhere in here

In this paper, we describe features that were implemented in the TikiWiki engine, in order to facilitate this sort of collaborative translation of wiki content. These features can be compared to the LizzyWiki system (Desilets et al., 2005), but they innovate in several ways:

  • Support truly multilingual sites as opposed to bilingual only.
  • Allow original modifications to a page at any point in time, rather than require pages to be synchronized.
  • Implementation in a full-featured wiki engine, instead of developing a prototype proof of concept.
  • Provide insight to readers regarding the up-to-dateness of a particular linguistic version of a page.
  • Record complete translation history to allow translation behavior to be studied.

The primary innovation in the way CLWE handles the content synchronization is that it does not need to analyze any of the content. In fact, it keeps track of the original atomic changes made to the page and how they propagate to other linguistic version. From those translation relations between the pages, the correct text difference and other values can be obtained afterwards.

The remainder of this paper is organized as follows. In Section ??, we discuss related work in that field. In Section ??, we provide information on the context in which this work is being done. In Section ??, we discuss the main innovation of the work, namely how we support concurrent, unconstrained editing and translation for all languages simultaneously, for more than two languages. In Section ??, we discuss a number of additional, more minor innovation of the system. In Section ??, we evaluate the technology. Finally, in Section ??, we discuss future work. Finally, in Section ?? we offer conclusions.

History

Information Version
Sat 12 of Apr, 2008 16:54 GMT alain_desilets 53
Sat 12 of Apr, 2008 16:08 GMT lphuberdeau 52
Sat 12 of Apr, 2008 15:21 GMT alain_desilets 51
Fri 11 of Apr, 2008 01:40 GMT alain_desilets 50
Fri 11 of Apr, 2008 01:39 GMT alain_desilets 49
Fri 11 of Apr, 2008 01:37 GMT alain_desilets 48
Fri 11 of Apr, 2008 01:36 GMT alain_desilets 47
Fri 11 of Apr, 2008 01:30 GMT alain_desilets 46
Fri 11 of Apr, 2008 01:18 GMT alain_desilets 45
Fri 11 of Apr, 2008 01:17 GMT alain_desilets 44
Fri 11 of Apr, 2008 01:17 GMT alain_desilets 43
Fri 11 of Apr, 2008 01:16 GMT alain_desilets 42
Fri 11 of Apr, 2008 01:15 GMT alain_desilets 41
Fri 11 of Apr, 2008 01:14 GMT alain_desilets 40
Fri 11 of Apr, 2008 01:13 GMT alain_desilets 39
Fri 11 of Apr, 2008 01:12 GMT alain_desilets 38
Fri 11 of Apr, 2008 01:11 GMT alain_desilets 37
Fri 11 of Apr, 2008 01:11 GMT alain_desilets 36
Fri 11 of Apr, 2008 01:02 GMT alain_desilets 35
Fri 11 of Apr, 2008 00:50 GMT alain_desilets 34
Fri 11 of Apr, 2008 00:42 GMT alain_desilets 33
Fri 11 of Apr, 2008 00:18 GMT alain_desilets 32
Fri 11 of Apr, 2008 00:10 GMT alain_desilets 31
Thu 10 of Apr, 2008 15:45 GMT lphuberdeau 30
Wed 09 of Apr, 2008 11:33 GMT alain_desilets 29
Wed 09 of Apr, 2008 00:14 GMT alain_desilets 27
Wed 09 of Apr, 2008 00:13 GMT alain_desilets 26
Wed 09 of Apr, 2008 00:11 GMT alain_desilets 25
Tue 08 of Apr, 2008 23:40 GMT alain_desilets 24
Tue 08 of Apr, 2008 23:39 GMT alain_desilets 23
Tue 08 of Apr, 2008 23:35 GMT alain_desilets 22
Tue 08 of Apr, 2008 22:42 GMT alain_desilets 21
Tue 08 of Apr, 2008 22:42 GMT alain_desilets 20
Tue 08 of Apr, 2008 22:41 GMT alain_desilets 19
Tue 08 of Apr, 2008 22:38 GMT alain_desilets 18
Tue 08 of Apr, 2008 22:35 GMT alain_desilets 17
Tue 08 of Apr, 2008 22:10 GMT alain_desilets 16
Tue 08 of Apr, 2008 19:35 GMT alain_desilets 15
Tue 08 of Apr, 2008 19:33 GMT alain_desilets 14
Tue 08 of Apr, 2008 18:18 GMT alain_desilets 13
  • «
  • 1 (current)
  • 2

Upcoming Events

No records to display