These are notes from a breakout session at the AMTA 2010 Workshop on Collaborative Translation and Crowdsourcing, Denver, Colorado, October 31st 2010.
This breakout session looked specifically at a cluster of questions that related business-related issues.
A first question raised was whether crowdsourcing mostly works for non-profit organizations. In other words, are there that many folks who will be willing to provide free translation services that will then be used by companies to make a profit?
There are examples of free crowdsourcing being used to translation commercial content produced by for profit organizations (ex: Adobe, Facebook), but it's still quite marginal. Not clear that this work at a larger scale.
On the other hand, it was pointed out that crowdsourcing does not necessarily mean that translators work for free, or even for low pay. The more controversial flavours of that technology (controversial for professional translators that is) do work that way, but they are not the only possible flavours. One could use the same processes and techniques to parallelise the translation of very large documents, to a large group of professional, well remunerated translators. This could be a way to decrease lead time. As an example of decreased lead time, Facebook found that for some language pairs, they could have all 350 000 words of their user interface translated in a matter of a day or two, by leveraging the crowd of users (who, in this case were unpaid volunteers).
Next, we talked about the fact that translation crowdsourcing is just one of many approaches that businesses and organisations can use for translating their content. We wondered what was the place of that new approach in the toolbox. If you want translation done today, your options are:
It was noted that Crowd-sourcing is sort of in between Professional human translation and MT in terms of:
So, crowdsourcing is appropriate in situations where you need a moderate level of those attributes. If you need to translate VERY, VERY LARGE amounts of text in a matter of hours, but can deal with moderate levels of quality, then probably raw MT is the only way to go. If you need VERY TOP QUALITY translations, but the volume is small and/or you can afford to wait several days, then you should go with professional human translation. If your needs are between those two in terms of those three criteria, then maybe translation crowdsourcing is a good option.
We also talked about the extent to which crowdsourcing might or might not be able to make a dent in the growing gap between demand for translation services, and the offer in terms of human translation. We did not have a clear sense or consensus about this. On the one hand, crowdsourcing does increase the supply side of human translators, but the demand is growing so fast (mostly due to the internet and user-generated content) that it's not clear that this increase in supply is sufficient.
Another topic that was discussed is relationship between crowdsourcing and what has been happening for years on Wikipedia. It was noted that on Wikipedia, contributions follow a power law with a relatively small number doing most of the contributions. But this "long tail" is still very important, because these large numbers of small changes and corrections end up amounting to something very significant. Also, in order to find this small core of very motivated wikipedians, the Wikipedia folks had to open up their site to the whole wide world, and solicit contributions from the whole population of internet users. Will similar trends apply to translation crowdsourcing?
It was noted that an important difference between Wikipedia and translation crowdsourcing, is that the later can be parallelised at much finer grain. Indeed, in translation crowdsourcing, it's possible (but not necessarily desirable, see below) to split a document into sentences, and have different members of the crowd translate different sentences. In contrast on Wikipedia, one cannot ask the crowd to write an article about a topic by writing one sentence each. It's true that once a first draft of an article has been written, people can contribute to it in a more parallel way, by adding or modifying specific parts of it, fixing typos, style, etc... But one would think that the original first version of an article is probably written by one, or a few people (however, this assumption probably needs to be verified empirically).
Different communities and organisations involved in translation crowdsourcing use different granularization approaches. Facebbook parallelises at the sentence level, while Kiva parallelises at the page level. Both approaches are probably appropriate for different situations and will present advantages and disadvantages of each are. But it's not 100% clear what the tradeoff space is like. For example, one would expect that coarser parallelisation would lead to higher quality, as it is recognized that translation out of context leads to poorer quality. But it could be that, on the contrary, parallelising at a sentence leads to higher quality, because you can have more than one person translating a particular phrase, and use voting to choose the one that seems best. This question of granularity is probably a good theme for empirical research.
We had a brief discussion about "indirect" crowdsourcing. For example, one can think of Google's Page Rank as a very indirect form of crowdsourcing, where people in the crowd "vote" on the importance of a page by creating links to it. Google then harvests these opinions from the crowd and uses them to prioritize list of hits. Similarly, many organisations (including Google), crawl the web looking for bilingual web pages, and use that data to train their MT systems. Is this a form of "indirect" crowdsourcing for translation? Can we think of other forms of "indirect" translation crowdsourcing?
We then went into a discussion of what translation crowdsourcing actually buys you, as an organization. It was noted that organizations often do not see cost reduction as the biggest benefit. In the talk she gave at ATA the week before, Naomi Baer listed the following benefits which have been mentioned by several organizations:
We thought time to translation is a particularly interesting one, because there are some striking examples there. As mentioned earlier, for some language pairs, Facebook was able to translate its user interface (350,000 words) in a matter or a day or two, using a sentence level parallelisation strategy. This got us thinking about hether reduced time to publication could be achieved, even in a more traditional context of professional translation shops, by using parallelised, crowdsourcing approaches. For example, ProZ is an e-bay style marketplace where customers can find professional translators for a particular task, but task assignment is done at a document level. What would happen if ProZ implemented a finer grained approach where individual sentences or paragraphs were sent to different professional translators for translation? What effect would that have on time to publication? What would be the price to pay in terms of quality and consistency?
We then got into the question of whether or not translation crowdsourcing would affect livelyhood of professional translators. Overall, it was thought that translation crowdsourcing is likely to exert downward pressures on cost for sure, but it may also create oppportunities for new kinds of work. It was noted that in the world of software development, the Open Source movement did not cause any programmers to loose their jobs. In fact, many programmers actually make a living working on Open Source projects, or selling value-added services that are based on Open Source software. The kinds of opportunities that crowdsourcing might open for professional translators include:
It was also pointed out that crowdsourcing is mostly applicable for types of content that are simply not being translated at the moment, and therefore, it may not take away jobs that professionals are already doing. These kinds of content include:
Real life examples of crowd-sourcing that work well are:
Most of the above examples (with the possible exception of TedTalks) are indeed about content that often does not get translated at all.
We also talked about the types of content that will ALWAYS be translated by professionals, at least for the foreseeable future:
We also thought that the impact on the translation profession may be less one of loosing jobs, but more one of changing the role of translators. The combination of crowdsourcing and Machine Translation may move the translation profession away from a role of "authoring original translations", and more towards a role of "revising and fixing" translations produced by the machine or the crowd.
A final optimistic point which was made was that, the whole point of crowdsourcing and MT, is to be able to deal with the mass of content that is not currently being translated. Even if only a very small portion of this content is deemed to need reviewing by a professional, this will probably be enough to keep all current and future professional translators employed on a full time basis.
This breakout session looked specifically at a cluster of questions that related business-related issues.
A first question raised was whether crowdsourcing mostly works for non-profit organizations. In other words, are there that many folks who will be willing to provide free translation services that will then be used by companies to make a profit?
There are examples of free crowdsourcing being used to translation commercial content produced by for profit organizations (ex: Adobe, Facebook), but it's still quite marginal. Not clear that this work at a larger scale.
On the other hand, it was pointed out that crowdsourcing does not necessarily mean that translators work for free, or even for low pay. The more controversial flavours of that technology (controversial for professional translators that is) do work that way, but they are not the only possible flavours. One could use the same processes and techniques to parallelise the translation of very large documents, to a large group of professional, well remunerated translators. This could be a way to decrease lead time. As an example of decreased lead time, Facebook found that for some language pairs, they could have all 350 000 words of their user interface translated in a matter of a day or two, by leveraging the crowd of users (who, in this case were unpaid volunteers).
Next, we talked about the fact that translation crowdsourcing is just one of many approaches that businesses and organisations can use for translating their content. We wondered what was the place of that new approach in the toolbox. If you want translation done today, your options are:
- Professional human translation
- Large expensive shops
- Mom and pop shops
- Crowd-sourcing
- Machine Translation
It was noted that Crowd-sourcing is sort of in between Professional human translation and MT in terms of:
- Volume
- Cost
- Quality
So, crowdsourcing is appropriate in situations where you need a moderate level of those attributes. If you need to translate VERY, VERY LARGE amounts of text in a matter of hours, but can deal with moderate levels of quality, then probably raw MT is the only way to go. If you need VERY TOP QUALITY translations, but the volume is small and/or you can afford to wait several days, then you should go with professional human translation. If your needs are between those two in terms of those three criteria, then maybe translation crowdsourcing is a good option.
We also talked about the extent to which crowdsourcing might or might not be able to make a dent in the growing gap between demand for translation services, and the offer in terms of human translation. We did not have a clear sense or consensus about this. On the one hand, crowdsourcing does increase the supply side of human translators, but the demand is growing so fast (mostly due to the internet and user-generated content) that it's not clear that this increase in supply is sufficient.
Another topic that was discussed is relationship between crowdsourcing and what has been happening for years on Wikipedia. It was noted that on Wikipedia, contributions follow a power law with a relatively small number doing most of the contributions. But this "long tail" is still very important, because these large numbers of small changes and corrections end up amounting to something very significant. Also, in order to find this small core of very motivated wikipedians, the Wikipedia folks had to open up their site to the whole wide world, and solicit contributions from the whole population of internet users. Will similar trends apply to translation crowdsourcing?
It was noted that an important difference between Wikipedia and translation crowdsourcing, is that the later can be parallelised at much finer grain. Indeed, in translation crowdsourcing, it's possible (but not necessarily desirable, see below) to split a document into sentences, and have different members of the crowd translate different sentences. In contrast on Wikipedia, one cannot ask the crowd to write an article about a topic by writing one sentence each. It's true that once a first draft of an article has been written, people can contribute to it in a more parallel way, by adding or modifying specific parts of it, fixing typos, style, etc... But one would think that the original first version of an article is probably written by one, or a few people (however, this assumption probably needs to be verified empirically).
Different communities and organisations involved in translation crowdsourcing use different granularization approaches. Facebbook parallelises at the sentence level, while Kiva parallelises at the page level. Both approaches are probably appropriate for different situations and will present advantages and disadvantages of each are. But it's not 100% clear what the tradeoff space is like. For example, one would expect that coarser parallelisation would lead to higher quality, as it is recognized that translation out of context leads to poorer quality. But it could be that, on the contrary, parallelising at a sentence leads to higher quality, because you can have more than one person translating a particular phrase, and use voting to choose the one that seems best. This question of granularity is probably a good theme for empirical research.
We had a brief discussion about "indirect" crowdsourcing. For example, one can think of Google's Page Rank as a very indirect form of crowdsourcing, where people in the crowd "vote" on the importance of a page by creating links to it. Google then harvests these opinions from the crowd and uses them to prioritize list of hits. Similarly, many organisations (including Google), crawl the web looking for bilingual web pages, and use that data to train their MT systems. Is this a form of "indirect" crowdsourcing for translation? Can we think of other forms of "indirect" translation crowdsourcing?
We then went into a discussion of what translation crowdsourcing actually buys you, as an organization. It was noted that organizations often do not see cost reduction as the biggest benefit. In the talk she gave at ATA the week before, Naomi Baer listed the following benefits which have been mentioned by several organizations:
- Community involvement
- Reduced time to publication
- Translating content that traditionally doesn’t get translated
- Expanding market reach to additional languages
We thought time to translation is a particularly interesting one, because there are some striking examples there. As mentioned earlier, for some language pairs, Facebook was able to translate its user interface (350,000 words) in a matter or a day or two, using a sentence level parallelisation strategy. This got us thinking about hether reduced time to publication could be achieved, even in a more traditional context of professional translation shops, by using parallelised, crowdsourcing approaches. For example, ProZ is an e-bay style marketplace where customers can find professional translators for a particular task, but task assignment is done at a document level. What would happen if ProZ implemented a finer grained approach where individual sentences or paragraphs were sent to different professional translators for translation? What effect would that have on time to publication? What would be the price to pay in terms of quality and consistency?
We then got into the question of whether or not translation crowdsourcing would affect livelyhood of professional translators. Overall, it was thought that translation crowdsourcing is likely to exert downward pressures on cost for sure, but it may also create oppportunities for new kinds of work. It was noted that in the world of software development, the Open Source movement did not cause any programmers to loose their jobs. In fact, many programmers actually make a living working on Open Source projects, or selling value-added services that are based on Open Source software. The kinds of opportunities that crowdsourcing might open for professional translators include:
- Quality Control of the crowd's translations
- Coaching and mentoring the crowd
- Terminology management, creating resources to help the crowd
- Crowd management: making sure the crowd is happy and productive
It was also pointed out that crowdsourcing is mostly applicable for types of content that are simply not being translated at the moment, and therefore, it may not take away jobs that professionals are already doing. These kinds of content include:
- Short lived, dynamic content
- Content that needs to be translated in languages with small markets, but with motivated native crowd
- Content produced by companies with small international outreach
- Content produced by non-profit orgs
Real life examples of crowd-sourcing that work well are:
- Fan translation: Content that you are personally attached to (ex: your favourite Japanese Anime
- TedTalks: high profile content
- Your favourite app (ex: Facebook)
- Customer support articles - Q: what motivates people to translate stuff there?
- A: A crowd of third value developers, value-added developers, with special priviledged relationship with the commercial organization. In those scenarios, good MT is crucial, cause crowd probably less motivated to contribute.
- Organisations whose mission are inspiring (ex: Kiva)
Most of the above examples (with the possible exception of TedTalks) are indeed about content that often does not get translated at all.
We also talked about the types of content that will ALWAYS be translated by professionals, at least for the foreseeable future:
- Material where cost of mistake is high
- Safety issue (legal, medical)
- Creative material
- Marketing
- Literary
We also thought that the impact on the translation profession may be less one of loosing jobs, but more one of changing the role of translators. The combination of crowdsourcing and Machine Translation may move the translation profession away from a role of "authoring original translations", and more towards a role of "revising and fixing" translations produced by the machine or the crowd.
A final optimistic point which was made was that, the whole point of crowdsourcing and MT, is to be able to deal with the mass of content that is not currently being translated. Even if only a very small portion of this content is deemed to need reviewing by a professional, this will probably be enough to keep all current and future professional translators employed on a full time basis.