Wiki-translation discussion


Project Proposal for GSOC 2009 related to Cross Language Information Retrieval

Hello,

I am Akshi Gupta and I have a project proposal on which I would like to work under GSOC 2009 program.Idea is as follows-

A document D1 is there in Language L1. User is familiar only with language L2. He wants other documents in language L2 which are related to document D1. The search would provide all the documents which are related to document D1 in language L2. This would exempt the user from the toil of analyzing the whole document himself, finding important points in it and then searching the related documents in language L2 using Cross Language Information Retrieval techniques. User would require to specify language L2 only in which he requires other documents.
I have done many mini projects on information retrieval before. I have devised some implementation strategies also which I can discuss if anyone is interested in this. Please have a look and let me know your views.

Thank you.

Akshi Gupta
Dear Akshi,

Sounds like an interesting project. Can you tell me a detailed, convincing story about a person who uses that feature to achieve a particular concrete goal? I find this is the acid-test for determining whether a cool feature is truly useful, or if it's a solution looking for a problem ;-).

I will apply to become a GSOC mentor.

BTW: If you are into that sort of thing, we have many different ideas for how to use Machine Translation inside of the Cross Lingual Wiki Engine. Some of which my colleague Marta Stojanovic has started working on.

One idea I think will be really useful is the following.

Joe, an anglophone, is looking at an English page. There is a note at the top which says that there is new content available on the Japanese, and on the Spanish versions of that page. Joe does not speak any of these languages. HOwever, he clicks on a link "view machine translation of updates" (or something like that), and the system displays a 100% up to date version of the page in English. The bulk of the page is human translated (or vetted), high quality translation. But there are a few sections which are highlighted in yellow, which are machine translation of updates from Spanish and Japanese, which have been automatically inserted in the proper location in the English text. Because these potentially poor translations are relatively short, and presented in the context of perfectly translated human text, they are easier to read and understand than a typical complete MT of a page.


> Hello,
>
> I am Akshi Gupta and I have a project proposal on which I would like to work under GSOC 2009 program.Idea is as follows-
>
> A document D1 is there in Language L1. User is familiar only with language L2. He wants other documents in language L2 which are related to document D1. The search would provide all the documents which are related to document D1 in language L2. This would exempt the user from the toil of analyzing the whole document himself, finding important points in it and then searching the related documents in language L2 using Cross Language Information Retrieval techniques. User would require to specify language L2 only in which he requires other documents.
> I have done many mini projects on information retrieval before. I have devised some implementation strategies also which I can discuss if anyone is interested in this. Please have a look and let me know your views.
>
> Thank you.
>
> Akshi Gupta

This feature will be of great importance. Suppose a user(say a student of India in Spain doing some research work) has a document in Spanish.He wants to get related documents. But he doesn't know Spanish. So he can not find out the important keywords in it and use cross language information retrieval tools. He has to translate it into in English and then has to find out important keywords and then can do searching. this feature would facilitate him to just upload the document and get all the important related documents in English.

I have worked on "Implementing Latent Semantic Indexing on CELL BE(a parallel processor)" and at present I am working on "Implementing Page Rank Algorithm efficiently on CELL BE". I am also doing one project on information retrieval. My main area of field is information retrieval. I would be happy to work on ideas for how to use Machine Translation inside of the Cross Lingual Wiki Engine.

If you find the above idea useful the we can further discuss its implementation details.Waiting for your reply..
> This feature will be of great importance. Suppose a
> user(say a student of India in Spain doing some research
> work) has a document in Spanish.He wants to get related
> documents. But he doesn't know Spanish. So he can not
> find out the important keywords in it and use cross
> language information retrieval tools.

Please don't take what I write below as completely poo-pooing the idea. I'm just trying to help you put the finger on the true value of the feature for end users.

In the story above, if the student does not speak Spanish, how did he find the Spanish page in the first place, and how does he know that it is relevant to the topic for which he wants pages in Indian?

> If you find the above idea useful the we can further
> discuss its implementation details.Waiting for your
> reply..

Do you have a Skype account? Mine is alain_desilets. Let's chat about this.
> Do you have a Skype account? Mine is alain_desilets. Let's chat about this.

Dear Akshi,

If you are still interested in doing a GSOC project related to cross-lingual features on TikiWiki, please Skype me at your earliest convenience (I am alain_desilets on Skype), or email me at alain dot desilets at nrc-cnrc dot gc dot ca.

Alain

Upcoming Events

No records to display