History: WS08Paper:Supporting concurrent editing and translation

Preview of version: 45

NOTE from AD: Need a better title for this section. Something about removing constraints.


NOTE from AD: On 2008-04-09, I reworked this section in a major way in order to make the argumentation flow better. In the process, I left a lot of content which had originally been written by LPH or AD. I moved this content at the end of this page, so we won't loose it. We should see if we need/can re-thread some of it in the argumentation.


NOTE FROM AD: Need a good vocabulary to describe things. "linguistic variant" to refer to something like "The English version of page X". This is better than using "version" which is ambiguous (it could refer to something like v1.2 of the page versus v1.3 of the page). Also, I used the term "edit" instead of "content element". I think it more correctly describes what it is. We need to make sure we use the vocabulary consistently throughout the paper.


In this section, we describe how the CLWE system lifts all but one of the assumptions described in the Introduction. We illustrate this with a detailed usage scenario, and describe some of the implementation details of the change tracking backend which is the key element to supporting this scenario. But first, we need to define some simple notations which we will use to describe the scenario and the backend.

Notations


In the rest of this paper, we will use a simple set-based notation to describe the state of a page and all its linguistic alternatives. For example, given a particular page available in English, French and Spanish, the following set of formulas, could be used to capture their state:

en_v3 = {e1, e2,, e3}
fr_v1 = {e1, e2, e3}
es_v2 = {e1, e2, e3, e4}
ALL = {e1, e2, e3, e4}

These formulas simply state that the English page is currently at version 3, and includes 3 edits: e1, e2 and e3. The French version is at version 1, and includes the "same" three edits. The Spanish version is at version 2, and includes the "same" three edits, plus a fourth one e4. The ALL set is the superset of all edits present in all linguistic versions of the page.

It is important to note that this notation makes a few interesting assumption about how to best represent the status of a page and its linguistic alternatives.

Firstly, the state of a particular linguistic alternative is described as a set of edits, where edits are cross-linguistic. In other words, when a particular edit that was made in one linguistic variant of a page is replicated in another linguistic variant of that page, the two are deemed to contain the "same" edit.

Secondly, the order in which edits were carried out is irrelevant. Because the state of the different linguistic variants is represent as sets of edits, it does not matter whether e1 was done before or after e2.

As we will see, these two assumptions turn out to be important in turning an apparently complex problem of tracking edits in multiple languages at once, into a much simpler problem.

Usage Storyboard


NOTES FROM AD: Need to figure out a scenario that illustrates how each of the assumptions in the Intro are lifted. Also, illustrate all the features LPH has implemented, or at least, the interesting ones. Haven't looked at the scenario below in detail, so I don't know if it meets that requirement. Also, we should narrate each step of the scenario using the Personaes above (or other personaes if they are better suited). After each screen shot, we should call the reader's attention to interesting UI elements that help the users carry out their tasks. Also, we should use the notation described in the Notations section to describe the state of the 3 pages at the bottom of each screen shot.

NOTES FROM AD: Below, we have two storyboards. One based on what LPH had produced, and a new one that AD created. Neither of them is right. LPH's storyboard used the most recent version of CLWE, but there are some key screens missing if we want to tell the full story. AD's storyboard has pretty much all the screen shots we might want, but they used an old version of CLWE that didn't have the X% up to date messages. We should produce a brand new storyboard that would follow the story put together by AD, but using the latest version of CLWE.Maybe we could take keyframes from Rick Sapir's video?

Old Scenario by LPH


The screenshots of this scenario are using the most recent version of CLWE. But there are some key screens missing if we want to tell the full story


In order to illustrate how CLWE supports unconstrained collaborative authoring and editing, we now describe a detailed usage storyboard. This storyboard involves three people (JohnDoe, MarieQuidam and CarloDelPueblo) collaboratively writing a technical report for a project in 3 languages at once (English, French and Spanish). For simplicity sake, we assume all three of them to be trilingual but that each of them are only changing the page in their native language. But note that the system does not require that to be the case.

First, John Doe creates an English page in two consecutive edits.

Image

Image


At this point, we have:

en_v2 = {e1, e2}
fr = None
es = None

At this point, Marie Quidam and Juan del Pueblo translate the English page to French and Spanish respectively. NOTE from AD: It's too bad that we don't have screen shot of what the translation UI looks like, the Under translation notices etc...

Image

Image


At this point, we have:

en_v2 = {e1, e2}
fr_v1 = {e1, e2}
es_v1 = {e1, e2}

And all pages are displayed as being up to date. At this point, Marie adds a list of bullet points to the French version. NOTE AD: Too bad we don't have a screen shot of Frenh with the bulet points. At this point, we have:

en_v2 = {e1, e2}
fr_v2 = {e1, e2, e3}
es_v1 = {e1, e2}

Both English and Spanish are displayed as being out of date ("22% up to date") like thus:

Image


and they provide links to view the "better" French version, and update from it (the green double arrow icon).

At this point, John decides to update the English version from the French, by clicking on the double green arrow icon in front of the French page name.

Image


The system shows John the French changes that need to be reproduced in English (changes highlighted in green). John copies and pastes them to the Englsh edit field (not visible here... which is unfortunate), and starts overwriting them with their English translation. He then saves it.

At this point, we have:

en_v3 = {e1, e2, e3}
fr_v2 = {e1, e2, e3}
es_v1 = {e1, e2}

The English is now shown as being up to date, but the Spanish is still behind. But Juan who looks after the Spanish version is not in the mood for translating right now, and wants to add original content instead. No problem, he can do that. He adds the exact dates of the event along with the location {es_v2}. The page indicates that the page does contain additional content. However, more content can be obtained from the French and English versions. NOTE from AD: Hum... seems that the image for this transaction is wrong. I don't see dates added!

We now have:

en_v3 = {e1, e2, e3}
fr_v2 = {e1, e2, e3}
es_v1 = {e1, e2, e4}

and all three versions are considered to need updating. Now that Juan has gotten his original edits out of the way, he is in the mood for translating changes from English. After he is done, we have:

en_v3 = {e1, e2, e3}
fr_v2 = {e1, e2, e3}
es_v1 = {e1, e2, e3, e4}

which means that Spanish is completely up to date, but French and English need updating from Spanish.



Images below need to be threaded in the above discussion

Image

Image

Image


As it can be seen in the scenario, to a content contributor, the translation process is invisible. As any visitor of the website, the contributor will see the "Page Translation" box presenting the different alternatives and status information. However, he is free to ignore it. When a change is made by a content contributor, a new original content contribution is recorded and other linguistic versions of the page get updated with the information.

The "Page Translation" box provides links for translators to view the relevant changes made to the page. When using those links, the translator is brought to a slightly different version of the edit page. The page displays the changes to be translated along with the text area. When the translator indicates that the translation of the changes is completed, the translation target gets marked as containing the changes provided by the translation source. Again, other linguistic versions of the page get updated with the new information.

The directed graph representation of the described scenario can be illustrated as in figure {architecture_graph.dot.png}. In the graph, white nodes are original content contributions and gray nodes are versions resulting from a translation efforts. Solid arcs are page evolutions from version to version and dashed arcs are translations from source to target. On each node, the original content contributions included in the version are listed on the second row.

The graph representation is in fact very close to the internal representation used. Beyond providing useful information for the site visitors and support translators, the entire translation history is preserved. Figure {en_history} presents the page history of the English page in the scenario. The information from the translation history will allow to analyze the translation patterns and evolution of the communities around the different linguistic versions of a page.

en_history
en_history



New scenario created by AD


NOTE: I (AD) have BMP files for each step of the way on my laptop.


CRAP: I created it on wiki-translation site which doesn't have the most recent version! It lacks the X% up to date messages! I will try to construct a story using the screen shots that LPH produced instead.


First, John Doe creates English page Final Report in two consecutive edits. We now have:

en_v2 = {e1, e2}
fr = None
es = None

Then, Marie Quidam sees the English page, and hits the Traduire ("Translate") button. She enters French as the target language, and Rapport Final as the name of the target page. The system pastes the content of the English page Final Report into an edit box, and inserts a note "Under translation" at the top. Marie translates the first sentence, when the phone rings. She hits the Traduction Partielle ("Partial Translation") button and answers the phone. The status of the various pages is now:

en_v2 = {e1, e2}
fr_v1 = {}
es = None

The sytem shows the pages as not being up to date, because it says that the en version is "better". When Marie comes back from her phone conversatio, she clicks on the update icon in front of Final Report (en), translates the remaining two sentences, and this time hits the Complete Translation button, to signal that she is done translating. Note however that Marie forgets to delete the Under translation line at the top of the page.

We now have:

en_v2 = {e1, e2}
fr_v2 = {e1, e2}
es = None

and the French version Rapport Final is now displayed as being equivalent to the English Final Report. Later, Juan Del Pueblo sees the French page, hits the ??? ("Translate") button and translates the whole content in one go, making sure to save using the ??? ("Complete Translation") button. We now have:

en_v2 = {e1, e2}
fr_v2 = {e1, e2}
es_v1 = {e1, e2}

and the Spanish page Informe Final is listed as being equivalent to both the English and French versions. Now, Marie notices that she forgot to erase the Under translation line at the top of the French page. So she edits it, erases the page, and saves using the Modif. Mineure ("Minor Change") button. This means that the change is not considered as a change that needs translation. So the state of the three pages remains the same, and they are still displayed as being up to date with each other.

Next, John Doe adds three bullet points to the English version of the page. We now have:

en_v3 = {e1, e2, e3}
fr_v2 = {e1, e2}
es_v1 = {e1, e2}

At this point, the French and Spanish versions are labeled as needing updating from English. But Marie is not in a translation mood and would rather add original content to the French version. No problem, she can do that, and she adds her own two bullet points to the list. We now have:

en_v3 = {e1, e2, e3}
fr_v2 = {e1, e2, e4}
es_v1 = {e1, e2}

At this point, all three versions are labeled as missing some edits. Now that Marie has gotten her original edits out her mind, she is in the mood for translating, and decides to update from the English version (by clicking on the double green arrows in front of the English page name). The changes to be translated are highlighted in green at the top of the screen. Marie copies and pastes it into the French edit field, then translates it to French and saves as a complete translation. We now have:

en_v3 = {e1, e2, e3}
fr_v3 = {e1, e2, e3, e4}
es_v1 = {e1, e2}

The French version is the most up to date, but English and Spanish still needs to be updated. Now, Juan wants to bring Spanish up to date. He can do this either by updating from English or French.

What next... Maybe Marie can now incorporate John's changes, and later on, Juan can decide which of EN or FR to update from... Q: Does the tool help him make that decision?

NOTES FROM AD: Storyboard below is being replaced by new storyboard above


Change tracking backend


As the above scenario illustrates, CLWE supports a very open ended workflow that lifts most of the assumptions of traditional authoring and translation environments.

A key element for supporting this open workflow is a backend capable of tracking edits made in the different pages, in such a way that it can help users to reproduce these changes in other linguistic variants of those same pages. We now describe how we addressed this challenge in the CLWE project.

NOTES FROM AD: Not sure that there is a whole lot to say here. In a sense, the Notations section kind of provided the main insight that form the basis of the backend. LPH, are there additional meaningful details that we could say about how this is done?

This tracking approach only presents a number of limitations, most of them minor.

NOTE FROM AD: LPH, when you first sent out the architecture document, you and I had day long chat about it, during which we discussed some of the limitations of the approach. I think, I was able to recall most of these points but not sure. Do you still have a transcript of that chat? The limitation I am thinking about was soe

Firstly, in cases where more than one edit are translated in a same translation transaction, the system is not able to identify which parts of the translated text correspond to which edit. TODO: Use part of the usage scenario to illustrate that For example, if Pierre translates edits e1, e2, and e3 from English to French, when Pedro later wants to update the Spanish page, he will only be able to translate these three edits at once.

TODO: Need to better phrase the second limitation below

Secondly, say EN = {e1, e2, e3}, FR = {e1, e2}. You translate ES from FR, and get ES = {e1, e2, e3}. ES will be shown as needing updating from EN (cause it's missing e3). As I recall, I think if you update ES from EN, the system will show you all three of e1, e2, and e3 in the EN, when in fact, it should only show you e3. Is that correct LPH? If so, can you remind me why that is (can you explain it using the {e1, e2, e3} notation.

Thirdly, the approach assumes that users will never do original edits while in the midst of translating edits from one language to another. If a user does that, then the system will simply consider the original edits to be part of the translation task. In other words, no new edit identifier will be generated. The end result of this is that the system will not be able to notify users that other pages need to include that edit as well.

Fourthly, the system relies heavily on the end user to tell it when pages are synchronized. TODO: Talk about what bad things can happen when the use clicks on the Complete Translation button instead of the Partial Translation button, and vice-versa. There is a short discussion of how this could be fixed this in the Related work section, which describes the problem. Maybe we should move that problem description here.


In practice, we haven't found the first limitation to be a big issue. The other two however seem like they could be problematic. Ideas for how to address them are proposed in the Future Work section.

End of current argumentation


NOTES FROM AD: The stuff in the remaining sections is stuff that had been written before I reorganized this page on 2008-04-09, and for which I didn't know if it should/could be threaded into the new reorganized argumentation. Need to evaluate this.

Maybe all we need is to reformat LPH's scenario a bit to use the format below, which communicates better than the impersonal approach. Also, add a bit about how prior art (LizzyWiki) succeeded in removing some of the constraints, but not all.

John creates an English page Welcome to this wiki. Pierre then translates it to French page Bienvenue à ce wiki. Later on, John adds three sentences to English page Welcome to this wiki. Now, Josée who does not speak English wants also to add two words to the French page Bienvenue à ce wiki.

With a standard wiki, the two pages would be distinct and no one would be aware of the content evolution in the other linguistic versions. With the LizzyWiki approach, Josée is not allowed to add her two words to Bienvenue à ce wiki before she has translated the ten sentences added by John to the English version Welcome to this wiki. But Josée cannot do this because she does not read English. Even if she could, she might not be in the mood to translate ten English sentences just to be allowed to add two words to the French version.

In (Désilets et al., 2006) the authors also postulated that in order to support collaborative authoring and translation in more than two languages at a time, it might be necessary to impose the use of pivot languages as intermediaries between other languages, in order to provide stable points of references in an otherwise chaotic environment.

With the CLWE project, we blindly ignored these constraints and allow authors to create original content on any linguistic version of any page, and at any time.

NOTES FROM AD: Not sure what to do with this content either


AD: Maybe some of this needs to be threaded back into the above argumentation, but I am not sure


In adding the required features to support change tracking, a few guidelines had to be followed:
  • Any contribution is worth translating until a translator says otherwise.
  • Only the final result matters. Intermediate steps can be ignored.
  • Content contributors should not have to worry about translation.

NOTES FROM AD: In a way, this breaks the argumentation a bit. In the Intro, we present the assumptions and constraints we are trying to relax. Those make very compelling "guiding principles". Then we get to this section and we present more "guiding principles". Are these new principles subsumed by the ones presented in the intro? If not, should we thread them into the intro, or are they better presented here? How can we present those new principles in a way that makes it clear to the reader that they are consistent with the previous principles? I have to admit, I don't see how the above three principles tie to a "simpler model of tracking". LPH said he will re-read this whole section (which he wrote as a sort of "stream of consciousness") and see if he can rework it and make it clearer, etc..._

These three guidelines are all tightly connected towards a solution. As mentioned before, an attempt to translate any change individually to every single linguistic version is impractical due to the limited translation resources. By focussing on the final results, it is possible for a translator to catch up on the content of a source page in a single step and abstract away the different steps that lead to the final content. In most cases, the translator can simply observe the changes that were made on the source page since the last time a synchronization occurred.

In a wiki content creation process, multiple edits and not all of them would be worth translating. Rather than having a single author, wiki pages are a collaboration work between many people. Someone would initially write the base content. Others would add to it. In the process, many people will make minor contributions that contribute to the quality of the content, but may not be relevant for translators. Such changes include grammatical corrections and syntactic improvements. Some changes may only affect the formatting of the page. Determining which change has to be translated is a complex task on which a line is very hard to draw.

A simple change to the syntax of a phrase may seem trivial, but if the previous formulation was ambiguous, a translator may already have made a wrong interpretation of it. In which case, the translation would need to be updated. A potential solution to this would be to ask the content contributor to say wether or not a change should be translated. However, the content contributor may not have sufficient knowledge about the translation process to make the right decision. With no information requested from the content contributors, it is possible to make the translation process as invisible as possible.

Because translators will translate an aggregate of changes rather than changes individually, it does not matter if a few trivial changes slip in. Very active translation communities propagating the changes often may get to translate very small changes that do not impact their version of the content. However, it is unlikely that a translation gets updated multiple times a day to match each change on a page.

The result of these guidelines is a very simplified model for change tracking. In the model, each change represents an idea by its author. As far as tracking in concerned, all changes are equal as they all have to be propagated to the other linguistic version. When a change is incorporated in a given linguistic version, all pages updating from the given page will also inherit the given change.

In a simplified manner, change propagation can be represented as a directed graph in which nodes are page versions and arcs are page evolutions. Arcs are used for both evolution between versions of a same page and to represent the translation of content from source to target language. In the graph, original content creation is attached to page versions. By following all arcs from the original page version node, all pages containing the change can be found.

Consider this sample case using three languages. An unlimited amount of languages are supported. However, the resulting scenarios and representations would be too large and impractical for the demonstration purposes.

History

Information Version
Tue 15 of Apr, 2008 19:09 GMT alain_desilets 55
Tue 15 of Apr, 2008 19:03 GMT alain_desilets 54
Thu 10 of Apr, 2008 16:20 GMT lphuberdeau 53
Thu 10 of Apr, 2008 12:13 GMT alain_desilets 52
Thu 10 of Apr, 2008 12:12 GMT alain_desilets 51
Thu 10 of Apr, 2008 12:11 GMT alain_desilets 50
Thu 10 of Apr, 2008 12:09 GMT alain_desilets 49
Thu 10 of Apr, 2008 12:06 GMT alain_desilets 48
Thu 10 of Apr, 2008 12:03 GMT alain_desilets 47
Thu 10 of Apr, 2008 03:03 GMT alain_desilets 46
Thu 10 of Apr, 2008 03:02 GMT alain_desilets 45
Thu 10 of Apr, 2008 02:58 GMT alain_desilets 44
Thu 10 of Apr, 2008 02:54 GMT alain_desilets 43
Thu 10 of Apr, 2008 02:36 GMT alain_desilets 42
Thu 10 of Apr, 2008 02:30 GMT alain_desilets 41
Thu 10 of Apr, 2008 02:19 GMT alain_desilets 40
Thu 10 of Apr, 2008 01:52 GMT alain_desilets 39
Thu 10 of Apr, 2008 01:42 GMT alain_desilets 38
Thu 10 of Apr, 2008 01:34 GMT alain_desilets 37
Thu 10 of Apr, 2008 01:06 GMT alain_desilets 35
Thu 10 of Apr, 2008 01:05 GMT alain_desilets 34
Thu 10 of Apr, 2008 00:39 GMT alain_desilets 33
Wed 09 of Apr, 2008 23:08 GMT alain_desilets 32
Wed 09 of Apr, 2008 11:28 GMT alain_desilets 31
Wed 09 of Apr, 2008 11:12 GMT alain_desilets 30
Wed 09 of Apr, 2008 10:57 GMT alain_desilets 29
Wed 09 of Apr, 2008 10:54 GMT alain_desilets 28
Wed 09 of Apr, 2008 10:53 GMT alain_desilets 27
Wed 09 of Apr, 2008 10:43 GMT alain_desilets 26
Wed 09 of Apr, 2008 10:25 GMT alain_desilets 25
Wed 09 of Apr, 2008 10:18 GMT alain_desilets 24
Tue 08 of Apr, 2008 23:58 GMT alain_desilets 23
Tue 08 of Apr, 2008 23:56 GMT alain_desilets 22
Tue 08 of Apr, 2008 23:54 GMT alain_desilets 21
Tue 08 of Apr, 2008 23:53 GMT alain_desilets 20
Tue 08 of Apr, 2008 23:41 GMT alain_desilets 19
Tue 08 of Apr, 2008 22:11 GMT alain_desilets 18
Tue 08 of Apr, 2008 22:10 GMT alain_desilets 17
Tue 08 of Apr, 2008 21:06 GMT alain_desilets 16
Tue 08 of Apr, 2008 21:01 GMT alain_desilets 15
  • «
  • 1 (current)
  • 2

Upcoming Events

No records to display