Output quality

These are notes from a breakout session at the AMTA 2010 Workshop on Collaborative Translation and Crowdsourcing, Denver, Colorado, October 31st 2010.

This breakout session focused on a cluster of issues that pertained to the quality of the translations produced via collaborative translation and crowdsourcing. The following questions were generated for this cluster in the brainstorming part of the workshop:

  • Assessing quality of each "drop in the cloud"
  • Quality metrics for collaborative translation/crowdsourcing
  • Using the crowd to help second language learners (by correcting their errors)
  • Improving quality of work produced by lay translators
  • Convergence vs Diversity of translated results. How to resolve the tension between the two?
  • How to identify and focus on those translation errors that really matter.
  • The "Wikipedia problem": Maximizing quality
  • Dangers of translating out context in a collaborative/crowdsourcing scenario
  • What granularity should be used in collaborative/crowdsourcing situations? How much context does the translator actually need?
  • To what degree can collaborative/crowdsourced translation approach or even exceed the level of quality that is achieved today with conventional processes and professional translators?

Below is a summary of the discussion that took place in the breakout session.

  • Professional translation: Buying a guarantee that the quality is high
  • Crowdsourced: Not the case

In some situations you don't know who the crowd is - history, reputation, suitability issues

How to design a pipeline for crowdsourcing translations in such a way that quality is built in

Manager over volunteers

MSDN wiki at Microsoft - high quality domain specific translation + passionate user community editing + moderation by Microsoft MVPs

The discussion around quality in a number of real world scenarios is about whether it's good enough to solve the problem - not necessarily whether it is comparable to what might be offered by a professional translator

What are the steps/metrics for a scenario to figure out what is "good enough" quality?

Currently the person in need of translation may not know what this metric is (or verify) - so a lot depend on reliability based on name/brand/references.

Given this is a bigger issue with crowdsourcing - there is an onus on the task provider to figure out a way to control QC. The nice thing is that Quality control can also be crowdsourced.

There is a distinction between determining who is a reliable translator vs what is a reliable translation.

Based on experience - typically MTurk translators separate themselves into good and bad piles and tend to follow.

Tracking the history of translations and history of "success rate" is a good way to track who is good (like wikipedia).

In traditional translation process - what is the current reputation management mechanism?

Crowdsourcing could be used to improve even the current traditional translation process.

How does assessments differ for crowd quality vs professional quality? Do aggregation of a large crowd of assessments comparable to a small set of professional assessments?

For Kiva - there are a lot of difficult translations, that have a lot of incoherence. Lot of crowd translators tend to "smooth" it by adding/picking meaning and then translating. This tends to have an issue with crowd assessment as well because they might prefer the smooth version. One suggestion is to give good guidelines to translators and assessors. Another one is to evaluate both sides for coherence and if there is a mismatch then there is an issue with translation. "Transcreation"

Chris - low data languages. 30K on translation at half a penny per word (3 c total per word - 2 c for translation and 1 c for review). Not as many issues with MT based spoofing. The process: 4 translators. 5 reviewers. 1 monolingual proof reader (ESL errors marked up). Show them to the (original?) translators.

We have several dimensions along which we can evaluate output quality - one of which is time it takes to produce the final..with LSP there is a better guarantee.

Upcoming Events

No records to display