History: WS08Paper:Supporting concurrent editing and translation

Preview of version: 27

NOTE from AD: Need a better title for this section. Something about removing constraints.


NOTE from AD: On 2008-04-09, I reworked this section in a major way in order to make the argumentation flow better. In the process, I left a lot of content which had originally been written by LPH or AD. I moved this content at the end of this page, so we won't loose it. We should see if we need/can re-thread some of it in the argumentation.


In this section, we describe how the CLWE system lifts all but one of the assumptions described in the Introduction. We illustrate this with a detailed usage scenario, and describe some of the implementation details of the change tracking backend which is the key element to supporting this scenario. But first, we need to define some simple notations which we will use to describe the scenarion and the backend.



Notations


In the rest of this paper, we will use a simple set-based notation to describe the state of a page and all its linguistic alternatives. For example, given a particular page available in English, French and Spanish, the following set of formulas, could be used to capture their state:

en_v3 = {e1, e2,, e3}
fr_v1 = {e1, e2, e3}
es_v2 = {e1, e2, e3, e4}
ALL = {e1, e2, e3, e4}

These formulas simply state that the English page is currently at version 3, and includes 3 edits: e1, e2 and e3. The French version is at version 1, and includes the "same" three edits. The Spanish version is at version 2, and includes the "same" three edits, plus a fourth one e4. The ALL set is the superset of all edits present in all linguistic versions of the page.

It is important to note that this notation makes a few interesting assumption about how to best represent the status of a page and its linguistic alternatives.

Firstly, the state of a particular linguistic alternative is described as a set of edits, where edits are cross-linguistic. In other words, when a particular edit that was made in one linguistic variant of a page is replicated in another linguistic variant of that page, the two are deemed to contain the "same" edit.

Secondly, the order in which edits were carried out is irrelevant. Because the state of the different linguistic variants is represent as sets of edits, it does not matter whether e1 was done before or after e2.

As we will see, these two assumptions turn out to be important in turning an apparently complex problem of tracking edits in multiple languages at once, into a much simpler problem.



Usage Scenario


In order to illustrate how CLWE supports unconstrained collaborative authoring and editing, we now describe a detailed usage scenario. The scenario involves three people (John, Pierre and Carlo) collaboratively writing a technical report in 3 languages at once (English, French and Spanish). John only speaks English, Pierre speaks English and French, while Carlo speaks English, French and Spanish.

NOTES FROM AD: Need to figure out a scenario that illustrates how each of the assumptions in the Intro are lifted. Haven't looked at the scenario below in detail, so I don't know if it meets that requirement. Also, we should narrate each step of the scenario using the Personaes above (or other personaes if they are better suited). After each screen shot, we should call the reader's attention to interesting UI elements that help the users carry out their tasks. Also, we should use the notation described in the Notations section to describe the state of the 3 pages at the bottom of each screen shot.

  1. A page gets created in English {en_v1}.
  2. In a second edit, some content is added {en_v2}. A third edit is then made.
  3. After this point, the page gets an original translation to both French {fr_v1} and Spanish {es_v1}.
  4. Afterwards, a French contributor decides to add the list of required sections {fr_v2}. After this modification, both English and Spanish versions indicate that they are not fully up to date {en_v3_2}. They provide links to view the other versions and update the content.
  5. An English translator responds to the request and includes the participant list in the English version {fr_v2_to_en_v4}. The English page correctly indicates that the page is now up to date, but the Spanish version is still behind {en_v4}.
  6. A Spanish contributor adds the exact dates of the event along with the location {es_v2}. The page indicates that the page does contain additional content. However, more content can be obtained from the French and English versions.
  7. A Spanish translator decides to update the Spanish version from the English source. In doing so, the Spanish version becomes fully up to date and includes the changes first made to the French version {es_v3}. Both the French and English versions now indicate that content can be obtained from the Spanish version.

Image

Image

Image

Image

Image

Image

Image

Image

Image



NOTES FROM AD: The discussion below should be reformatted to follow a storyboarding format. ex: "First, John does X (fig 1). We see that the UI helps John by displaying A. Then, Pierre does X (fig 2) and the system helps him by displayin Y. We see how this lifts assumption N. etc..."

As it can be seen in the scenario, to a content contributor, the translation process is invisible. As any visitor of the website, the contributor will see the "Page Translation" box presenting the different alternatives and status information. However, he is free to ignore it. When a change is made by a content contributor, a new original content contribution is recorded and other linguistic versions of the page get updated with the information.

The "Page Translation" box provides links for translators to view the relevant changes made to the page. When using those links, the translator is brought to a slightly different version of the edit page. The page displays the changes to be translated along with the text area. When the translator indicates that the translation of the changes is completed, the translation target gets marked as containing the changes provided by the translation source. Again, other linguistic versions of the page get updated with the new information.

The directed graph representation of the described scenario can be illustrated as in figure {architecture_graph.dot.png}. In the graph, white nodes are original content contributions and gray nodes are versions resulting from a translation efforts. Solid arcs are page evolutions from version to version and dashed arcs are translations from source to target. On each node, the original content contributions included in the version are listed on the second row.

The graph representation is in fact very close to the internal representation used. Beyond providing useful information for the site visitors and support translators, the entire translation history is preserved. Figure {en_history} presents the page history of the English page in the scenario. The information from the translation history will allow to analyze the translation patterns and evolution of the communities around the different linguistic versions of a page.

en_history
en_history



Change tracking backend


As the above scenario illustrates, CLWE supports a very open ended workflow that lifts most of the assumptions of traditional authoring and translation environments.

A key element for supporting this open workflow is a backend capable of tracking edits made in the different pages, in such a way that it can help users to reproduce these changes in other linguistic variants of those same pages. We now describe how we addressed this challenge in the CLWE project.

NOTES FROM AD: Not sure that there is a whole lot to say here. In a sense, the Notations section kind of provided the main insight that form the basis of the backend. LPH, are there additional meaningful details that we could say about how this is done?

This tracking approach only presents three limitations. Firstly, in cases where more than one edit are translated in a same translation transaction, the system is not able to identify which parts of the translated text correspond to which edit. TODO: Use part of the usage scenario to illustrate that For example, if Pierre translates edits e1, e2, and e3 from English to French, when Pedro later wants to update the Spanish page, he will only be able to translate these three edits at once. In practice, we haven't found this to be a severe limitation.

Secondly, the approach assumes that users will never do original edits while in the midst of translating edits from one language to another. If a user does that, then the system will simply consider the original edits to be part of the translation task. In other words, no new edit identifier will be generated. The end result of this is that the system will not be able to notify users that other pages need to include that edit as well.

Thirdly, the system relies heavily on the end user to tell it when pages are synchronized. TODO: Talk about what bad things can happen when the use clicks on the Complete Translation button instead of the Partial Translation button, and vice-versa. There is a short discussion of how this could be fixed this in the Related work section, which describes the problem. Maybe we should move that problem description here.

End of current argumentation


NOTES FROM AD: The stuff in the remaining sections is stuff that had been written before I reorganized this page on 2008-04-09, and for which I didn't know if it should/could be threaded into the new reorganized argumentation. Need to evaluate this.

Maybe all we need is to reformat LPH's scenario a bit to use the format below, which communicates better than the impersonal approach. Also, add a bit about how prior art (LizzyWiki) succeeded in removing some of the constraints, but not all.

John creates an English page Welcome to this wiki. Pierre then translates it to French page Bienvenue à ce wiki. Later on, John adds three sentences to English page Welcome to this wiki. Now, Josée who does not speak English wants also to add two words to the French page Bienvenue à ce wiki.

With a standard wiki, the two pages would be distinct and no one would be aware of the content evolution in the other linguistic versions. With the LizzyWiki approach, Josée is not allowed to add her two words to Bienvenue à ce wiki before she has translated the ten sentences added by John to the English version Welcome to this wiki. But Josée cannot do this because she does not read English. Even if she could, she might not be in the mood to translate ten English sentences just to be allowed to add two words to the French version.

In (Désilets et al., 2006) the authors also postulated that in order to support collaborative authoring and translation in more than two languages at a time, it might be necessary to impose the use of pivot languages as intermediaries between other languages, in order to provide stable points of references in an otherwise chaotic environment.

With the CLWE project, we blindly ignored these constraints and allow authors to create original content on any linguistic version of any page, and at any time.

NOTES FROM AD: Not sure what to do with this content either


AD: Maybe some of this needs to be threaded back into the above argumentation, but I am not sure


In adding the required features to support change tracking, a few guidelines had to be followed:
  • Any contribution is worth translating until a translator says otherwise.
  • Only the final result matters. Intermediate steps can be ignored.
  • Content contributors should not have to worry about translation.

NOTES FROM AD: In a way, this breaks the argumentation a bit. In the Intro, we present the assumptions and constraints we are trying to relax. Those make very compelling "guiding principles". Then we get to this section and we present more "guiding principles". Are these new principles subsumed by the ones presented in the intro? If not, should we thread them into the intro, or are they better presented here? How can we present those new principles in a way that makes it clear to the reader that they are consistent with the previous principles? I have to admit, I don't see how the above three principles tie to a "simpler model of tracking". LPH said he will re-read this whole section (which he wrote as a sort of "stream of consciousness") and see if he can rework it and make it clearer, etc..._

These three guidelines are all tightly connected towards a solution. As mentioned before, an attempt to translate any change individually to every single linguistic version is impractical due to the limited translation resources. By focussing on the final results, it is possible for a translator to catch up on the content of a source page in a single step and abstract away the different steps that lead to the final content. In most cases, the translator can simply observe the changes that were made on the source page since the last time a synchronization occurred.

In a wiki content creation process, multiple edits and not all of them would be worth translating. Rather than having a single author, wiki pages are a collaboration work between many people. Someone would initially write the base content. Others would add to it. In the process, many people will make minor contributions that contribute to the quality of the content, but may not be relevant for translators. Such changes include grammatical corrections and syntactic improvements. Some changes may only affect the formatting of the page. Determining which change has to be translated is a complex task on which a line is very hard to draw.

A simple change to the syntax of a phrase may seem trivial, but if the previous formulation was ambiguous, a translator may already have made a wrong interpretation of it. In which case, the translation would need to be updated. A potential solution to this would be to ask the content contributor to say wether or not a change should be translated. However, the content contributor may not have sufficient knowledge about the translation process to make the right decision. With no information requested from the content contributors, it is possible to make the translation process as invisible as possible.

Because translators will translate an aggregate of changes rather than changes individually, it does not matter if a few trivial changes slip in. Very active translation communities propagating the changes often may get to translate very small changes that do not impact their version of the content. However, it is unlikely that a translation gets updated multiple times a day to match each change on a page.

The result of these guidelines is a very simplified model for change tracking. In the model, each change represents an idea by its author. As far as tracking in concerned, all changes are equal as they all have to be propagated to the other linguistic version. When a change is incorporated in a given linguistic version, all pages updating from the given page will also inherit the given change.

In a simplified manner, change propagation can be represented as a directed graph in which nodes are page versions and arcs are page evolutions. Arcs are used for both evolution between versions of a same page and to represent the translation of content from source to target language. In the graph, original content creation is attached to page versions. By following all arcs from the original page version node, all pages containing the change can be found.

Consider this sample case using three languages. An unlimited amount of languages are supported. However, the resulting scenarios and representations would be too large and impractical for the demonstration purposes.

History

Advanced
Information Version
Tue 15 of Apr, 2008 19:09 GMT alain_desilets 55
Tue 15 of Apr, 2008 19:03 GMT alain_desilets 54
Thu 10 of Apr, 2008 16:20 GMT lphuberdeau 53
Thu 10 of Apr, 2008 12:13 GMT alain_desilets 52
Thu 10 of Apr, 2008 12:12 GMT alain_desilets 51
Thu 10 of Apr, 2008 12:11 GMT alain_desilets 50
Thu 10 of Apr, 2008 12:09 GMT alain_desilets 49
Thu 10 of Apr, 2008 12:06 GMT alain_desilets 48
Thu 10 of Apr, 2008 12:03 GMT alain_desilets 47
Thu 10 of Apr, 2008 03:03 GMT alain_desilets 46
Thu 10 of Apr, 2008 03:02 GMT alain_desilets 45
Thu 10 of Apr, 2008 02:58 GMT alain_desilets 44
Thu 10 of Apr, 2008 02:54 GMT alain_desilets 43
Thu 10 of Apr, 2008 02:36 GMT alain_desilets 42
Thu 10 of Apr, 2008 02:30 GMT alain_desilets 41
Thu 10 of Apr, 2008 02:19 GMT alain_desilets 40
Thu 10 of Apr, 2008 01:52 GMT alain_desilets 39
Thu 10 of Apr, 2008 01:42 GMT alain_desilets 38
Thu 10 of Apr, 2008 01:34 GMT alain_desilets 37
Thu 10 of Apr, 2008 01:06 GMT alain_desilets 35
Thu 10 of Apr, 2008 01:05 GMT alain_desilets 34
Thu 10 of Apr, 2008 00:39 GMT alain_desilets 33
Wed 09 of Apr, 2008 23:08 GMT alain_desilets 32
Wed 09 of Apr, 2008 11:28 GMT alain_desilets 31
Wed 09 of Apr, 2008 11:12 GMT alain_desilets 30
Wed 09 of Apr, 2008 10:57 GMT alain_desilets 29
Wed 09 of Apr, 2008 10:54 GMT alain_desilets 28
Wed 09 of Apr, 2008 10:53 GMT alain_desilets 27
Wed 09 of Apr, 2008 10:43 GMT alain_desilets 26
Wed 09 of Apr, 2008 10:25 GMT alain_desilets 25
Wed 09 of Apr, 2008 10:18 GMT alain_desilets 24
Tue 08 of Apr, 2008 23:58 GMT alain_desilets 23
Tue 08 of Apr, 2008 23:56 GMT alain_desilets 22
Tue 08 of Apr, 2008 23:54 GMT alain_desilets 21
Tue 08 of Apr, 2008 23:53 GMT alain_desilets 20
Tue 08 of Apr, 2008 23:41 GMT alain_desilets 19
Tue 08 of Apr, 2008 22:11 GMT alain_desilets 18
Tue 08 of Apr, 2008 22:10 GMT alain_desilets 17
Tue 08 of Apr, 2008 21:06 GMT alain_desilets 16
Tue 08 of Apr, 2008 21:01 GMT alain_desilets 15
  • «
  • 1 (current)
  • 2

Upcoming Events

No records to display