When a user translates content from one language to another, the GUI should gently constrain him so that he is not tempted to write original new content while doing translation. This ends up screwing things up big time for the translation tracking infrastructure upon which the CLWE depends. But in my experience, people who translate on wiki sites are also people who author original content. And they often switch to an author role while in the process of translating. I have done this many times without realizing it, eventhough I was aware of the issue with translation tracking.
The attached Excel spreadsheet is a mockup of what this GUI might look like. Essentially, it constrains the user to only edit lines on the target language side, for which the corresponding line in the source language has changed.
Note that this is not bullet-proof. Indeed, the user could still make original changes on those lines that need translation. But at least, it minimizes opportunities for this to happen, and also, the constrained format sends a clear message to the translator that he is supposed to only translate, not create original content.
Note also that this assumes we can make a correspondence between lines in source and target language pages. Not clear how hard that is to do, but I think it should be feasible in most cases.
The scenarios are described using a table form. Colums correspond to English, French and Spanish pages. Rows correspond to particular actions done by users.
The above scenario raises an important question. Once translation to FR has started from say, v1.2 of EN, should all translation effort be based on 1.2 until it is completed? In other words, if some change happens on EN, bringing it to v1.3, while translation of v1.2 is still undergoing, should translation now try to use v1.3 as the starting point, or should we still aim to first get translation of FR to synch with EN v1.2?
On the one hand, making the translation effort still be based on En v1.2 makes things less confusing from the point of view of the end user. All he has to worry about is En v1.2 changes that have been translated to FR, and those that haven't. If we base the translation on En v1.3, the user now has to deal with 2 versions of EN and one version of FR. In other words, he has 4 types of changes to worry about: EN v1.2 changes that have been translated, EN v1.2 changes that haven't, EN v1.3 changes that have been translated, and EN v1.3 changes that haven't. And I'm not even talking about En v1.2 changes that were translated to Fr, but whose translation now needs to be modified because that same sentence was modified in En V1.3. Also, there is the issue that the translator is trying to synch with a moving target. By the time he has dealt with changes in En v1.2 and En v1.3, there could have been changes on EN that bring it to En v1.4.
On the other hand, making the translation effort still be based on En v1.2 may make it harder for translators to translate vital information that may only be available in En v1.3. Translator would have to first complete translation of En v1.2 before he could proceed with translation of En v1.3. But it could be that the info on En v1.3 is more important and vital than the on En v1.2, and should have higher priority. Also, it could be that the change in En v1.3 actually deletes some sentences that previously needed to be translated from En v1.2. So why should the user translate them, only to delete them later when he goes on to translate En v1.3?
So, not sure what to do. Maybe we need to play around with some real examples to get a better feel for the real issue.
This case is very similar to Case #2 in that there are now 3 different versions of the page. But it's more complicated in that the 3 versions are all in different languages (in Case #2, two of the versions were EN and the third was FR).
Similarly to Case #2, there is an issue of what to do when some linguistic version of the page is modified after translation has started from a particular historical version of a particular linguisitic version. If at some point we start creating ES from EN v1.2, and in the mean time, FR v1.5 is created by modifiying a FR version that is aligned with EN v1.2... should translation to ES continue to be based on EN v1.2, or should it try to incorporate elements of the FR v1.5 version? Same tradeoff in this case.
If we keep ES translation fixed on EN v1.2, we make the task much simpler cognitively for the translator.
But on the other hand, if there are some very high priority content elements in FR v1.5, we can't bring them into ES until translation from EN v1.2 has completed. Also, it could be that some of the changes in FR v1.5 are actually deleting sentences that previously needed to be translated from EN v1.2 to ES. So the user would end up translating those sentences, only to delete them later.
I (AD) also wonder if there are possible deadlock situations, along the lines of translation from FR v1.5 to ES cannot proceed until translation from EN v1.2 to ES has proceeded, but translation from EN v1.2 to ES cannot proceed until something else has been done which itself assumes completion of translation from FR v1.5 to ES has been completed. Can't think of an example right now, but it sounds like the kind of thing that could happen. Need to think about this dreadlock issue.
All the above cases present complications of the type:
In all scenarios, the conflict happens between the time a translation is saved, and the translation is picked up again.
But what about a case where:
These conflicts could be even more complicated. I don't know what the implications are.
Maybe this constrained GUI is the wrong idea altogether.
Maybe we should just let the users do whatever they want, and then use statistical cross-lingual alignment algorithms to try and figure out what aligns to what, what needs to be translated from what, etc...
Need to think about that some more.
The attached Excel spreadsheet is a mockup of what this GUI might look like. Essentially, it constrains the user to only edit lines on the target language side, for which the corresponding line in the source language has changed.
Note that this is not bullet-proof. Indeed, the user could still make original changes on those lines that need translation. But at least, it minimizes opportunities for this to happen, and also, the constrained format sends a clear message to the translator that he is supposed to only translate, not create original content.
Note also that this assumes we can make a correspondence between lines in source and target language pages. Not clear how hard that is to do, but I think it should be feasible in most cases.
Scenarios of use
Here are various scenarios that look at how this kind of GUI could help, preserve sentence alignment, and where and how automated sentence alignment algorithms could come in handy.The scenarios are described using a table form. Colums correspond to English, French and Spanish pages. Rows correspond to particular actions done by users.
- Cr(Si-Sk): means an original creation of sentences i through k.
- Mod(Si-Sk): means an original modification of the same.
- Del(Si-Sk): means deletion of the same.
- Tr(Si-Sk): means translation of sentences i through k, or translation of modifications made to those sentences..
- Tr(Si-Sk, <ES): same as above, except that translation is from the Spanish version. If second argument is not provided, it means translation is from the language of the first column.
- View: Means the page is viewed by a reader.
Simple case: Initial page creation and translation in one language
EN | FR | System Behaviour | Description | |
Cr(s1-s100) | System displays usual page creation dialog. | An EN author creates a page in English, which has 100 sentences. | ||
Tr(s1-s30) | System displays constrained side-by-side translation GUI. English sentences pasted to the FR side, all of them open for editing. | A FR translator partly translate the page from EN to FR. | ||
View | System displays a warning at the top, saying the page is under translation. Also, the system could either display the FR page with the untranslated EN sentences left in English, or it could replace them by a Machine Translation into French. The system can tell which sentences have been translated and which haven't, because the untranslated sentences are the same on the FR and EN side (not bullet proof... there could be EN sentence that translator left "en anglais dans le texte". But in any case, this kind of situation would only remain until the translator explicitly labels the two versions as being in synch) | A FR reader views the FR page. | ||
Tr(s31-s100) | System displays EN and FR sentences side by side. All sentences on FR side are still open for editing. This includes sentences that have already been translated, because thes translations may have only been a first draft and the translator may want to to revisit them . BUT the sentences on the FR side that are still identical to the corresponding sentences in the EN side (i.e. have not been touched since pasting) are highlighted somehow so that the translator can easily find sentences for which not even a draft has been produced. When the translator hits Save, the system knows that all the EN sentences pasted on the FR side have been touched. So it should suspect that the translation may be complete. If the translator did not check the translation done checkbox, it should nag him and ask him if he is done with this translation. | FR translator finishes the translatioon. | ||
Case 2: Source page modified while being translated in single language
EN | FR | System Behaviour | Description | |
Cr(s1-s100) | System displays usual page creation dialog. | An EN author creates a page in English, which has 100 sentences. | ||
Tr(s1-s30) | System displays constrained side-by-side translation dialog. English sentences pasted to the FR side, all of them open for editing. | A FR translator translates the page. | ||
Mod(s5), Del(s10), Cr(s20b) | System displays usual page editing dialog. | EN author modifies sentence 5, deletes sentence 10, and adds a new sentence betwen sentence 20 and 21. | ||
View | At this point, the system knows that s10 has been deleted in EN. Should it remove that sentence from the display (but keep the underlying FR file as is), or should it keep displaying it in the FR version until a FR translator explicitly agrees with the English deletion? Maybe it's OK for the system to not display s10 if the FR translator has not started translating it? What about S5? Should it display the FR version of S5 that the FR translator translated? Or should it display the modified EN s5 pasted into FR, or a machine translation version of it? What about 20b? Should the system display that new sentence in English, or a a MT version of it? Or no 20b at all? | FR reader views the page | ||
Tr(s30-s100) | System displays constrained side-by-side translation GUI. All sentences on FR side are still open for editing, in case the translator wants to revisit the wording of sentences for which he already created a draft translation. BUT the sentences on the FR side that are still identical to the corresponding sentences in the EN side (i.e. have not been touched since pasting) are highlighted somehow so that the translator can easily find sentences for which not even a draft has been produced. The question though is this. Should the EN side display the s5, s10 and s20b lines that have changed in EN since FR translation began? And if the answer to this is yes, then what should it show for those lines on the FR side? In particular, for S5, should the system display the new EN version in the FR side, hence erasing the earlier FR translation that the user created? Probably not, because the EN modification to s5 may be very minor and it's better to start from the FR translation produced earlier. What about S10. Should that one be deleted from the FR side, or should we just show something indicating that this sentence has been deleted on the EN side, and asking the translator to confirm that he wants to delete it from FR also? What about 20b? Should the EN version of this new sentence be pasted on the FR side? Another question is whether or not the system should nag the translator when he hits Save. If he did bring s1-s100 in synch with the first version of EN, maybe this is a good opportunity to create a synch point by asking the user if the two sides are in line (assuming of course, that we don't display the more recent changes that have happened in EN since). | A FR translator translates the rest of the initial page, but does not yet worry about the new changes to S5, S10 and S20b. | ||
Tr({s5, s10, s20b}) | At this point, the system DEFINITELY should display he changes that have happened in EN since translation of original page started. But how should they be displayed? In particular, for S5, should the system display the new EN version in the FR side, hence erasing the earlier FR translation that the user created? Probably not, because the EN modification to s5 may be very minor and it's better to start from the FR translation produced earlier. What about S10? Should that one be deleted from the FR side, or should we just show something indicating that this sentence has been deleted on the EN side, and asking the translator to confirm that he wants to delete it from FR also? Probably the later. What about 20b? Should the EN version of this new sentence be pasted on the FR side? At this point, probably yes. And when the use clicks Save, the system should definitely nag the him to see if the two versions are now in synch (assuming that all FR sentences that needed translation have been touched). | FR translator translate the changes that have been made to EN since translation of orginal EN page started. | ||
The above scenario raises an important question. Once translation to FR has started from say, v1.2 of EN, should all translation effort be based on 1.2 until it is completed? In other words, if some change happens on EN, bringing it to v1.3, while translation of v1.2 is still undergoing, should translation now try to use v1.3 as the starting point, or should we still aim to first get translation of FR to synch with EN v1.2?
On the one hand, making the translation effort still be based on En v1.2 makes things less confusing from the point of view of the end user. All he has to worry about is En v1.2 changes that have been translated to FR, and those that haven't. If we base the translation on En v1.3, the user now has to deal with 2 versions of EN and one version of FR. In other words, he has 4 types of changes to worry about: EN v1.2 changes that have been translated, EN v1.2 changes that haven't, EN v1.3 changes that have been translated, and EN v1.3 changes that haven't. And I'm not even talking about En v1.2 changes that were translated to Fr, but whose translation now needs to be modified because that same sentence was modified in En V1.3. Also, there is the issue that the translator is trying to synch with a moving target. By the time he has dealt with changes in En v1.2 and En v1.3, there could have been changes on EN that bring it to En v1.4.
On the other hand, making the translation effort still be based on En v1.2 may make it harder for translators to translate vital information that may only be available in En v1.3. Translator would have to first complete translation of En v1.2 before he could proceed with translation of En v1.3. But it could be that the info on En v1.3 is more important and vital than the on En v1.2, and should have higher priority. Also, it could be that the change in En v1.3 actually deletes some sentences that previously needed to be translated from En v1.2. So why should the user translate them, only to delete them later when he goes on to translate En v1.3?
So, not sure what to do. Maybe we need to play around with some real examples to get a better feel for the real issue.
Case 3: Page modified in a 3rd language while being translated to a 2nd one.
EN | FR | ES | System Behaviour | Description |
Cr(s1-s100) | See previous cases. | EN author creates 100 sentences page. | ||
Tr(s1-s100) | See previous cases. | FR translator translates the page in one go. | ||
Tr(s1-s30, <EN) | See previous cases. | ES translator translates the first 30 sentences from EN | ||
Mod(s5), Del(s10), Cr(s20b) | See previous cases. | FR author modifies 3 sentences. | ||
View | Q: Should the system display changes that have been made in FR? Should it display them in FR or should it show a MT of the FR change? If we show the change in FR, the ES page could end up having ES, EN and FR sentences in it. If we display a MT of the FR, it could be low quality, because MT only goes between EN and other languages. So to go from FR to ES, we have to do FR->EN, and then EN->ES which could greatly distort the message. | ES reader views the page. | ||
Tr(s30-s100, <EN) | Should the translation GUI first help the translator finish translation of s1-s100 from the original EN page, and only then allow him to translate the changes from FR to ES? Or should it already start displaying changes that need to be translated from FR to ES? | ES author translates remaining sentences from original EN page. |
This case is very similar to Case #2 in that there are now 3 different versions of the page. But it's more complicated in that the 3 versions are all in different languages (in Case #2, two of the versions were EN and the third was FR).
Similarly to Case #2, there is an issue of what to do when some linguistic version of the page is modified after translation has started from a particular historical version of a particular linguisitic version. If at some point we start creating ES from EN v1.2, and in the mean time, FR v1.5 is created by modifiying a FR version that is aligned with EN v1.2... should translation to ES continue to be based on EN v1.2, or should it try to incorporate elements of the FR v1.5 version? Same tradeoff in this case.
If we keep ES translation fixed on EN v1.2, we make the task much simpler cognitively for the translator.
But on the other hand, if there are some very high priority content elements in FR v1.5, we can't bring them into ES until translation from EN v1.2 has completed. Also, it could be that some of the changes in FR v1.5 are actually deleting sentences that previously needed to be translated from EN v1.2 to ES. So the user would end up translating those sentences, only to delete them later.
I (AD) also wonder if there are possible deadlock situations, along the lines of translation from FR v1.5 to ES cannot proceed until translation from EN v1.2 to ES has proceeded, but translation from EN v1.2 to ES cannot proceed until something else has been done which itself assumes completion of translation from FR v1.5 to ES has been completed. Can't think of an example right now, but it sounds like the kind of thing that could happen. Need to think about this dreadlock issue.
Case #4: Target page modified while translation to from a Source is going on.
EN | FR | System Behaviour | Description | |
Cr(s1-s100) | See previous cases. | EN author creates page with 100 sentences. | ||
Tr(s1-s30) | See previous cases. | FR translator translates the first 30 sentence from original EN. | ||
Mod(s5), Del(s10), Cr(s20b) | What should the FR author see when he edits the page that is undergoing translation? For the sentences that have already been translated from EN to FR, they obviously should be displayed in editable form. But what about the EN sentences that have not yet been translated to FR, and were simply pasted into the FR side? Should these be displayed? Should they be editable? If they are editable, isn't there a risk that the author will start translating them through this editing UI instead of the constrained translation UI? And if he does, might that cause problems down the line when a translator comes to translate that same sentence from EN? If we don't display those untranslated lines, then how do we consolidate the changes made by the FR author with the underlying file which DOES contain the pasted EN sentences? Maybe the untranslated EN sentences should be displayed but enclosed inside [UnderTranslation: Please do not modify][/UnderTranslation]. Also, the edit form could have some hidden variables that list all the UnderTranslation sentences originally contained in the page. If upon Save the system finds that some of them are missing or have been modified, or moved (i.e. their order has changed), the system would not accept the edit and request that the user restores them as before. | FR author modifies 3 sentences in FR page, while translation to FR is still on the way. | ||
View | For En sentences that have already been translated, and those that have not yet been translated, see Case #1. But what about the changes that were made to FR since translation began? Should those be displayed to the user? I would say yes, since they are already in FR. | FR reader views the FR page, which is being translated and has at the same time been modified. | ||
Tr(S31-s100) | WHAT SHOULD HAPPEN HERE? | FR translator continues translation of remaining sentences from original EN. | ||
Additional cases
All the above cases present complications of the type:
- A translator starts translating a page but clicks Save before he is done translating.
- An author then changes one of the linguistic versions (Source, Target, 3rd language) of that page.
- A translator then pickups translation of the page.
In all scenarios, the conflict happens between the time a translation is saved, and the translation is picked up again.
But what about a case where:
- A translator starts translating a page but does not yet click Save.
- While this translator is still editing the translation, an author changeschanges one of the linguistic versions (Source, Target, 3rd language) of that page.
These conflicts could be even more complicated. I don't know what the implications are.
Maybe we have the wrong paradigm?
Maybe this constrained GUI is the wrong idea altogether.
Maybe we should just let the users do whatever they want, and then use statistical cross-lingual alignment algorithms to try and figure out what aligns to what, what needs to be translated from what, etc...
Need to think about that some more.