“It works, but how far can it go?”

9
“It works, but how far can it go?” Dallin Hardcastle

description

“It works, but how far can it go?”. Dallin Hardcastle. IBM CANDIDE. Statistical Machine Translation. Project from 1987-1994. IBM’s theory was that over 50% of western languages are completely predictable. Through the use of algorithms (IBM claimed 80% accurate algorithms). - PowerPoint PPT Presentation

Transcript of “It works, but how far can it go?”

Page 1: “It works, but how far can it go?”

“It works, but how far can it go?”

Dallin Hardcastle

Page 2: “It works, but how far can it go?”

IBM CANDIDE

• Statistical Machine Translation.• Project from 1987-1994.• IBM’s theory was that over 50% of western

languages are completely predictable.• Through the use of algorithms (IBM claimed

80% accurate algorithms).• Accuracy in translation was generally 60%

Page 3: “It works, but how far can it go?”

IBM CANDIDE

• It only used very large bilingual corpora.• Did not take into account any grammars,

lexicons, phonological rules, etc.• The U.S. government DARPA (Defense

Advanced Research Projects Agency) rated SYSTRAN higher than IBM’s new system, and used it frequently in the 1990’s.

• DARPA even helped fund CANDIDE.

Page 4: “It works, but how far can it go?”

SYSTRAN

• Founded in 1968 by Dr. Peter Toma.• Survived the major decrease of funding from

ALPAC.• Has offices in Paris and La Jolla.• During the Cold War, helped the US Air Force

extensively.• Provides technology for Babel Fish, also

translation widget on Mac OS X.

Page 5: “It works, but how far can it go?”

SYSTRAN

• Rule Based Machine Translation.• In a book by Yorick Wilcks, an AI professor in

England, he claims that RBMT (like Systran) has outperformed SMT (like Candide) up to this point.

Page 6: “It works, but how far can it go?”

So, SMT or RbMT?

• SMT seems to flow more “fluently”.

• Generally, only 60% accuracy (on the high side).

• Algorithms are not tailored to any specific languages. Benefit? Downfall?

• Sometimes awkward constructions.

• Once rules are established, much higher accuracy rates.

• Translation between two languages with well-formed rules is easier. (Costly)

Page 7: “It works, but how far can it go?”

Google Translate/Babel Fish

• http://www.youtube.com/watch?v=_GdSC1Z1Kzs

• Babel Fish is now gone (for now), replaced by Bing Translator, also SMT.

• Babel Fish was run by SYSTRAN

Page 8: “It works, but how far can it go?”

HYBRID

• Dr. Wilcks proposes that for MT technology to truly advance, there must be highly sophisticated HYBRID systems.

• This means a mix of SMT and RbMT.• Trados uses TM, whether local or from a server, but

as far as very rapid, accurate, totally automated MT, we are not there yet.

Page 9: “It works, but how far can it go?”

Some hybrid companies?

• IBM– Working with LinguaSys

• SYSTRAN– In 2010, new Hybrid

software