Source quality
Noise examples
- Absence of diacritics
- Foreign words used instead of native words
- Abbreviations/shorthand (compression)
- Lack of standard in writing
Recipients use language model to unpack the information that has been compressed in the first place.
Strategies to address source quality issues
- Tools: T9, spell-check. How far should we go with cleaning? Is the cleaning done before publishing? Before training?
- Rewards (bonus points, trust points)
- Normalization (at run-time and during training)
- User acceptance (anecdotal evidence that users adapt their authoring when they interface with bots; more users studies are required).