Page under construction. Please come back tomorrow. — Alain
Starting September 2008, Marta Stojanovic and Alain Désilets of the National Research Council of Canada will start a 12 month R&D effort around CLWE.
As many of you know, choosing a good research question is very difficult task, so please help us by reading the possible ideas below, providing comments, and rating them. A good research question is one for which:
- The answer is not known already, and cannot be found easily.
- The answer matters and has large practical consequences for a particular community.
Thx for your help. We are aiming to choosing one of them by mid-september.
BTW: When you rate ideas, make sure you make your own mind and write your answer down before looking at ratings from other folks.
BTW:
Contexts of use
While collaborative translation has applications in a wide range of situation, we are particularly interested in research that will have impact in the following contexts:
- Government organization that have some sort of legal obligation to provide content in multiple languages (ex: Canadian Goverment, UN departments, European Commission departments).
- Companies that need to produce user documentation for their products in multiple languages, and who want to outsource this work to the community of users.
Q1: What is the current state of collaborative translationpractices and technologies?
Description
There are lots of sites that are doing collaborative translation, and many technologies that are used to support them. A partial list can be found here:
At this point in time, nobody seems to have a good handle on everything that is happening. It would be good to write a good synthesis of what is happening.
For example, we could write a survey that analyzes the different communities and tools in terms of the extent to which they operate without relying on the
Assumptions of conventional translation processes.
Why is this question important?
This is important so we know what has been done already, so we can figure out what the important unresolved problems are, and can focus on solving those instead of re-inventing the wheel.
What makes this a research question?
This is not hardcore quantitative research, but it it falls in the category of qualitative research. It will involve gathering information, writing and analysing surveys, and synthesizing the information into a big picture.
Proposal assessment (please prefix your scores with your initials)
Please help us by providing your own assessment of this research question, on three levels.
Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance
Workload: How many person month do you think it will take to answer that question?
Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.
Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).
Your assesment
[+]
- Importance: AD=5
- Workload (person month): AS=1
- Research level: AS=2
Description
Professional translators have all sorts of Computer Assisted Translation (CAT) tools at their disposal (ex: terminology databases, translation memories), which amateur translators working in collaborative fashion often do not have.
Which of these tools should be integrated into collaborative translation platforms, and if so, how?
In this project, we would integrate open source CAT tools into TikiWiki and gather feedback about their usefulness.
Why is this question important?
This is important because CAT tools have great potential for increasing the productivity of volunteer translators in a collaborative environment.
What makes this a research question?
CAT tools are pretty mature, and we know how to build them for professional translators. We also know that they have a good impact on productivity.
But it's not clear to what extend tools need to be different to help amateur translators, and the extent to which it will actually improve their productivity.
Proposal assessment (please prefix your scores with your initials)
Please help us by providing your own assessment of this research question, on three levels.
Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance
Workload: How many person month do you think it will take to answer that question?
Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.
Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).
Your assesment
[+]
- Importance: AD=3
- Workload (person month): AS=6
- Research level: AS=1
Q3: How can Machine Translation help collaborative translation communities?
Description
Collaborative translation communities often do not have sufficient human resources to cover all language pairs, and to provide translation of all content in a timely fashion.
Machine Translation might help in several ways:
- Automatically provide a "gist quality" translation of new content. This would be only a temporary measure until a human finds the time fix it.
- Allow volunteer translators to translate content from a source language that they can't read. For example, the MT system would provide a "bad" English translation of a page written in Japanese, and the user could fix that bad English without having to actually read the original Japanese.
Why is this question important?
This is important because communities don't want to spend most of their human resources and energy in translation as opposed to creation of original content. MT has the potential of providing "good enough" translation at a fraction of the cost in human resources that fully manual translation can offer.
What makes this a research question?
MT is still bleeding edge technology, so application that uses it is definitely research.
While there have been studies of the use of MT outputs for the purpose of gisting, and as first drafts to be post-edited by human translators, those have focused on translation of whole documents.
In the context of a collaborative community, we are more likely to want to apply MT to updates to pages. There are some interesting new issues with that context.
For example, consider a French page that is perfectly translated by a human. Someone adds two sentences to the English page. Wouldn't it be nice to be able to insert an MT translation of just those two sentences into the French page, maybe highlightin them in yellow with a warning saying that they were MT translated? Could it be that two, potentially poorly MT translated setences are more easily understandable when presented in the context of a perfectly translated document? Also, how do we go about reliably inserting those two sentences at the right place in the French (ex: using alingment technology).
Also, suppose I have an English page that is initially all translated by MT to French. Then, I manually correct the bad MT translation to make it perfect. In particular, I modify the structure of sentence 2 to make it sound more like a French sentence (the MT translation used an English-like sentence structure). Then, someone changes the English sentence number 2. What should the MT system do? Should it replace French sentence 2 by an MT translation of the newly modified English sentence 2? If so, chances are that I will have to redo the structure modification in the French sentence 2. Is there a way that the MT system could learn from my correction made to the original French sentence 2, and use the same sentence structure to retranslate the updated English sentence 2?
There may also be some "softer" Human Computer Interaction types of issues. For example, how best to entice readers of a bad MT translation (either of a whole page, or just of a page), to become an active participant in the community by fixing the translation?
Proposal assessment (please prefix your scores with your initials)
Please help us by providing your own assessment of this research question, on three levels.
Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance
Workload: How many person month do you think it will take to answer that question?
Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.
Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).
Your assesment
[+]
- Importance: AD=3
- Workload (person month): AS=6
- Research level: AS=1
Q4: How useful is the current implementation of CLWE?
Description
We have made real progress in the CLWE project, thanks to the excellent work by Louis-Philippe Huberdeau. For a demo, see:
How useful is this to end users as it is now? What are the remaining problems to be addressed?
Why is this question important?
CLWE is still at beta stage, and it is crucial to evaluate it in real-use situations, in order to improve it.
What makes this a research question?
This is not hardcore, quantitative style of research, but it falls within the realm of more qualitative Human Computer Interaction research.
Proposal assessment (please prefix your scores with your initials)
Please help us by providing your own assessment of this research question, on three levels.
Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance
Workload: How many person month do you think it will take to answer that question?
Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.
Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).
Your assesment
[+]
- Importance: AD=5
- Workload (person month): AS=2
- Research level: AS=1
Q5: How to better isolate textual elements in a page that need translation?
Description
The CLWE system does a pretty good job at knowing when say, the French page is missing some edits that have been made in the English and Spanish pages.
But it does not do a great job at identifying the actual textual elements in the English and Spanish pages that need to be reproduced in French.
The actual issues are a bit hard to explain, but are described in the paper entitled "The Cross-Lingual Wiki Engine: Enabling Collaboration Across Language Barriers" (soon to be available on the web... Google for the title). See the Limitations section of that paper for a description of the problem, and the Future research section for a description of potential solution.
Why is this question important?
The current implementation can cause a lot of confusion. For example, when translating a change from English to French, the system might indicate that certain portions of the English page need translation into French, when in fact, these English passages were actually created in French originally, and translated to English.
This can cause the users to completely loose faith in the system.
What makes this a research question?
While diff technology is pretty straightforward, patching technology isn't, and often requires that the human be kept in the loop. The main challenge of this project is to find a way to:
- Take a diff between say, versions v5 and v6 of the English page
- Show those diffs in the context of the current version of the English page, say v9.
As far as we know, this is not a trivial problem.
Proposal assessment (please prefix your scores with your initials)
Please help us by providing your own assessment of this research question, on three levels.
Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance
Workload: How many person month do you think it will take to answer that question?
Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.
Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).
Your assesment
[+]
- Importance: AD=4
- Workload (person month): AD=2
- Research level: AD=2
Q6: What is the value of supporting cross-lingual searching, and how best to implement it?
Description
In a site that is collaborative translated, some of the information may be available only in particular languages and not in other.
When searching for information, users probably want to find the information no matter in which language it is present. But obviously they don't want to write the same query in different languages.
There are experimental technologies for doing cross-lingual search. For example, writing a query in English, and having the system search for that in all languages (usually by automatically translating the query to different languages). Combined with Machine Translation system for translating the hits found in different languages, this might be good enough for people to find and understand information in pages written in langauges that they can't read.
Does such a feature have value for collaborative translation communities? If so, where does it lie? How can we best implement such features?
Why is this question important?
This is another way to deal with the fact that in collaboratively translated sites, is not always possible to translate all relevant information to all languages in a timely fashion.
What makes this a research question?
Cross Lingual Search technology is still bleeding edge, so it's not clear that it will work to a sufficient level to provide value to end users. We plan to find out by building it and trying it how with real end users.
Proposal assessment (please prefix your scores with your initials)
Please help us by providing your own assessment of this research question, on three levels.
Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance
Workload: How many person month do you think it will take to answer that question?
Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.
Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).
Your assesment
[+]
- Importance: AD=3
- Workload (person month): AD=3
- Research level: AD=4