History: Source quality
View page
Source of version: 4
(current)
These are notes from a breakout session at the ((AMTA 2010 Workshop|AMTA 2010 Workshop on Collaborative Translation and Crowdsourcing)), Denver, Colorado, October 31st 2010. This breakout session focused on a cluster of issues that pertain to the impact and importance of source quality in a collaborative/crowdsourced translation context. !Media/Channels where UGC has source quality issues *SMS *Twitter *IM *Forum !!Noise examples *Absence of diacritics *Foreign words used instead of native words *Abbreviations/shorthand (compression) *Lack of standard in writing Recipients use language model to unpack the information that has been compressed in the first place. !Strategies to address source quality issues *Tools: T9, spell-check. How far should we go with cleaning? Is the cleaning done before publishing? Before training? *Rewards (bonus points, trust points) *Normalization (at run-time and during training) *User acceptance (anecdotal evidence that users adapt their authoring when they interface with bots; more users studies are required).
SideMenu
Home Page
Demo screencast
Last changes
CLWE To do
Forum Wiki-translation
BabelWiki Workshop
Architecture document (pdf)
CLWE Paper (pdf)
Edit Side Menu
Latest Changes
No records to display
...more
Search
Find
Log In
Username:
Password:
I forgot my password
CapsLock is on.
Log in
Upcoming Events
No records to display