BabelWiki08 Collaboratively building free and open dictionaries

We started with a discussion of what it means for a dictionnary to be "free":

  • Free as in people are free to contribute to it is important.
  • Free as free beer (free of charge) is important too, especially for volunteer non-professional translators.

We followed with different people doing quick demos of free, open dictionaries they know about, like:

  • Edict (Japanese <-> English)
  • dict.leo.org (German <-> X)
  • urban dictionary

And also some more conventional ones like GDT (Grand Dictionnaire Terminologique).

It was apparent that different resources have different levels of wiki-ness.

For example, in Leo, entries in the actual dictionary are controlled by a relatively small number of people (or so we think... we were not quite sure), but if a user searches for something that is not in the dictionary, the system will forward him to postings in a discussion forum that is relevant to that entry (this is a discussion forum for translators searching for ways to translate particular terms and expressions). This seems to be pretty effective even if it's not 100% the wiki way.

We then talked about what translators need from these resources. It was noted that professional translators don't use wiki resources because:

  • They lack coverage of terms and expressions that they need to search for.
  • Their UI is completely inapproprite for the kinds of tasks they need to carry out.

There is also the issue of quality of the data in the dicts. Translators have been trained to only use trusted sources, and seem to have a bit of a prejudice against open resources. At the same time, when they don't find what they need in trusted sources, they have no qualms about using Google to search the internet for a solution. So, why not open resources? It was also noted that translators don't seem to really mind noisy resource in the end. If a tool shows them 10 suggestions, all of them bad except one, they are happy. It's when the tool shows NO suggestion at all that they are unhappy.

There was a discussion about the "a single entry per concept" idea. For example, you don't have an entry for the word "head". Instead, you have several entries, one for the concept of the body part, one for the concept of the director of an organisation, etc... While this is useful if you want to use the data to build things like cross-lingual searching, this is not the way that translators think about the world. They think more in terms of "these words in my particular current context translate into the following words". If we want translators to contribute to an open dictionary, we need to be careful not to impose the concept view on them. But if we can find an easy way to get translator to formulate data in terms of concept, without changing their way of thinking about things, that would be great.

We ended the discussion with a list of things that need to be addressed by tools, for free collaborative dictionaries to become a reality:

  • Good UI, which allow translators to use the data and contribute to it with a minimal number of mouse clicks.
  • Good data structure that makes it possible to reuse the data to do NLP kind of tasks, but do not impose a "concept" world view on the translators.
  • Bootstrapping resources with sufficient data to make the resource attractive to potential users and contributors
    • Ex: Mine it from wikipedia, wiktionary, etc...
  • Ways to allow the community to provide collaborative feedback on the entries
    • Ex: implicit positing voting through a "copy to clipboard" button, talk pages, etc...


Upcoming Events

No records to display