Massively collaborative sites like Wikipedia are revolutionizing the way we think and deal with content creation. This is bound to have profound impacts on the way that we think and deal with content
translation as well (Desilets 2007... this would be my Aslib keynote).
In particular, it raises the question of how to go about collaboratively translating content that is collaboratively created and ever changing. This is a situation that many community-built sites find themselves in. For example, SUMO, the Mozilla Support community had been using traditional techniques for documentation until recently. When they decided to move to wiki documentation, they faced an important problem with content localization.
In order to provide quality documentation to a very large user base, the SUMO community requires the content to be translated in at least 8 major languages. To the Mozilla foundation, widespread adoption of the web browser is a critical business objective. Without quality documentation available to a large public, those objectives are impossible to reach.
TODO: Look at what Olaf said (transcribed on the "Message" page) and see how it can be threaded in here.
Translating content in this sort of environment presents a number of unique challenges, compared to translation in more traditional environments (Desilets et al., 2005):
- Impracticality of imposing a master language, and forcing all original contributions to be written first in that language.
- Ever changing nature of the content which may never reach a "final" stage.
- Difficulty of enforcing timely translation into all target languages.
- Non-professional nature of many of the volunteers doing the translations.
In this paper, we describe features that were implemented in the TikiWiki engine, in order to facilitate this sort of collaborative translation of wiki content. These features can be compared to the LizzyWiki system (Desilets et al., 2005), but they innovate in several ways:
- Support truly multilingual sites as opposed to bilingual only.
- Allow original modifications to a page at any point in time, rather than require pages to be synchronized.
- Implementation in a full-featured wiki engine, instead of developing a prototype proof of concept.
- Provide insight to readers regarding the up-to-dateness of a particular linguistic version of a page.
- Record complete translation history to allow translation behavior to be studied.
The primary innovation in the way CLWE handles the content synchronization is that it does not need to analyze any of the content. In fact, it keeps track of the original atomic changes made to the page and how they propagate to other linguistic version. From those translation relations between the pages, the correct text difference and other values can be obtained afterwards.
The remainder of this paper is organized as follows. In Section ??, we discuss related work in that field. In Section ??, we provide information on the context in which this work is being done. In Section ??, we discuss the main innovation of the work, namely how we support concurrent, unconstrained editing and translation for all languages simultaneously, for more than two languages. In Section ??, we discuss a number of additional, more minor innovation of the system. In Section ??, we evaluate the technology. Finally, in Section ??, we discuss future work. Finally, in Section ?? we offer conclusions.