Massively collaborative sites like Wikipedia are revolutionizing the way we think and deal with content creation. This is bound to have profound impacts on the way that we think and deal with content
translation as well (Desilets 2007a... this would be my Aslib keynote).
In particular, it raises the question of how to go about collaboratively authoring and translating content in several languages at the same time, and to do it in an organic, continuous fashion. This is a situation that many community-built sites find themselves in. For example, SUMO, the Mozilla Support community had been using traditional techniques for documentation until recently. When they decided to move to wiki documentation, they faced an important problem with content localization.
In order to provide quality documentation to a very large user base, the SUMO community requires the content to be translated in at least 8 major languages. To the Mozilla foundation, widespread adoption of the web browser is a critical business objective. Without quality documentation available to a large public, those objectives are impossible to reach.
This sort of collaborative community-based translation could also have applications in commercial translation.
TODO: Look at what Olaf said about crowdsourcing, etc... and see if we can thread it in here. Also, AD should dig out the various articles he found in Multilingual Computing magazine that talk about use of collaborative models in commercial translation.
TODO: Look at what Olaf said (transcribed on the "Message" page) and see how it can be threaded in here.
TODO: LPH wrote some excellent introductory French material in his project report. Use some of it here
Translating content in this sort of collaborative environment presents a number of unique challenges, compared to more traditional environments (Desilets et al., 2005). The primary difference is that in a collaborative environment, the process is much less controlled and more "chaotic". Traditional translation processes and tools operate under a number of assumptions which simply do not hold in a collaborative environment. Below is a list of those.
NOTE: We should choose concise yet highly descriptive names for these assumptions so we can refer to them later in the paper_
- Assumption 1 - Master language: In a traditional context, original content is typically created in a master language, usually English. This is not realistic in a collaborative context, because authors may not all be fluent enough in English to write high quality content in that language.
- Assumption 2 - Limit original changes during translation: In a traditional context, once translation of content in the master language has started, there is a strong tendancy to limit changes to the master language version until it has been translated to all other languages. This is not realistic for a collaborative context since content often never reaches a final stage.
- Assumption 3 - Can ensure timely translation: In a traditional context, one can assume that timely translation of content can be ensured through contractual obligations with the translator. In a collaborative context, this is not possible since translators are often unpaid volunteers working on their own free time.
- Assumption 4 - Focus on small list of languages: In a traditional context, there is a tendancy to focus on a small list of "core" languages, in order to minimize the number of language pairs for translation. In a collaborative contetxt, members of the community are usually allowed to create content in whatever language, including minority languages.
- Assumption 5 - Strong coordination of authors and translators: In a traditional context, the community of authors and translators is a "closed" world, where everyone knows each other, and there is some central authority that coordinates everything. In a collaborative context, authors and translators contributing to the documents are not coordinated and often do not know each other.
- Assumption 6 - Trained translators: In a traditional context, translators are usually professionallly trained, and can be "enculturated" into the organisation's tools and processes. In a collaborative context, translators are often amateur, and the amount of tool and process training that one can impose on them is limited.
- Assumption 7 - Separation of Authoring and Translation: In a traditional environment, authoring and translation are clearly segragated, and there are very little chances for the two to interfere with each other. Authors do not have to know about translation processes and translators do not have to know about the authoring processes. In a collaborative environment, it is difficult to separate those two processes, and the people doing the two are often the same one. As a consequence, there is a risk that introducing a translation process into a wiki community will complexify the basic authoring operations that have made the success of many wiki communities.
TODO: Eliminate or merge sentences which are redundant in the argumentation below.
Thus the main technological challenge of collaborative translation is to come up with tools and processes that do not depend on those assumptions. In other words, change must be embraced rather than constrained.
At the same time, the tool and processes must have sufficient structure to allow the community members to author and translate content in an effective manner, without having to do everything manually. For example, they must allow the various linguistic communities to work synergesticly without having to rewrite content from scratch in evey language (as is typically done in Wikipedia, where the current inability to synchronize between languages causes linguistic silos to be created). Also, they must be easy to use in order to minimize cognitive load on the users, and minimize the chance of human errors (ex: content in one language not being translated in another language).
In this paper, we describe a wiki-based system which support collaborative authoring and translation, yet lifts all but the last of the above assumptions (
Separation of Authoring and Translation). Although our system does provide tight integration between authoring and translation, users must still take care not to do original edits in the course of carrying out a translation transaction. The system is based on TikiWiki, a popular and full-featured wiki engine.
To our knowledge, our system is the first one to go this far in supporting collaborative authoring and translation of content, and to be usable in actual production settings.
TODO: I (AD) don't understand what LPH is saying here. LPH, can you clarify?
With limited resources on some translation pairs, it would be nearly impossible to replicate changes between some languages. To have any chance of success, working around the limited language pairs is necessary.
TODO: I (AD) don't understand what LPH is saying here. LPH, can you clarify?
Breaking the language pair barrier means that content should be available from as many sources as possible to provide translators with decent alternatives and reduce the amount of efforts they will have to provide in order to understand the changes made. To machines, all linguistic versions are distinct pieces of data and they wouldn't know how to map the change in a different context. However, to humans, the idea behind a change is present in the content of any page it has been translated to. Any page that incorporated the idea can be used a source for the given change.
AD: I think this is mentioned above albeit in less detail Without tool support, this kind of effort would be possible on small amounts of languages with a lot of discipline by the translators and appropriate use of change comments. However, the error rates would be very high and some changes would be lost along the way. Tool support embedded into the wiki engine could allow to support translators in their efforts and reduce the error rates.
TODO: (AD) Find a way to reference LPH's blog entries of 2004-06 somewhere in here
The remainder of this paper is organized as follows. In Section ??, we provide information on the context in which this work is being done. In Section ??, we discuss related work in that field. In Section ??, we discuss a system which removes most of the constraints and assumptions discussed above. In Section ??, we evaluate the technology. Finally, in Section ??, we discuss future work. Finally, in Section ?? we offer conclusions.