SDL.com

05/04/2010

The Lionbridge and IBM announcement on machine translation

CEO Mark Lancaster writes: The Lionbridge and IBM announcement on machine translation is quite interesting, so I thought I’d share my thoughts:

  • It shows there is recognition in the market that the wheels are starting to turn on Machine Translation (MT). SDL invested in machine translation technology as early as 2001, followed by our rollout of SDL KbTS post-editing services in 2004. It’s nice to see the market recognizing as important something SDL identified over 10 years ago!
  • What is crucial in the use of machine translation is:

    Integration into a platform that the translators and corporates use, so machine translation can be accessed from the translators environment, can work in concert with translation memories, terminologies and most importantly can deal with tagging and multiple file formats (see below for integration into SDL Trados Studio and SDL TMS).

    The process of training/customizing and reviewing machine translation output for continual learning and quality improvement. SDL obviously feels that the investment made over that 10 year period in post-editing/computation linguistics and the technology solutions around KbTS puts SDL in a leadership position to benefit from this surge in interest in machine translation. SDL devised this patented process some 6 years ago and as a result of this work provides commercial integrated machine translation solutions processing more than 100m words a year for companies like CNH, Renault, Chrysler, Microsoft, Best Western, and RS Components.
  • SDL’s investment is not just in the provision of post-editing services across our worldwide office infrastructure - it is in the technology itself. SDL provides Desktop and Enterprise scale solutions that seamlessly integrate with machine translation engines. Connecting to a machine translation engine is a relatively straight forward task, connecting to a machine translation engine that produces a gist of sufficient quality that post editors can work with cost effectively (e.g. tagging/format painting preserved and preserved in the correct position in the gist) is more challenging. Combine that challenge with the need to combine machine translation with Translation Memory and Terminology wrapped in an automated Workflow with editing environments that translators love to work in and you can see why SDL leads this market in technology solutions and services.
  • In February SDL launched SDL Studio and SDL TMS systems with integrated machine translation, working with the leading machine translation providers – Google, Language Weaver and Systran. This effectively put machine translation in the hands of translators as well as the corporate clients they provide services for.
  • SDL makes significant investment in technology to make everyone in the supply chain more productive – having an ‘open’ architecture supported by extensive APIs enables SDL to easily scale its solutions (Desktop and Enterprise) to include machine translation. This progress in our technology scalability/capability has not happened overnight and it cannot be short-cut by ‘open source solutions’ – it has taken a lot of investment and focus and listening to client feedback over the years.
  • Lionbridge have made a bold move, and all credit to them for getting the deal lined up. That said, it will be interesting to see how long it takes to get to market. While IBM has researched machine translation for a long time, the quality of their gists across a lot of language pairs is somewhat of an unknown. Google, Language Weaver , Systran and SDL ETS have all been very active in improving the gist quality of their machine translation products and building a business model around their technology. I have not seen IBM having a high profile in this space - either direct license sales or post-editing solutions. Also, it is not clear that Lionbridge has any science capacity to understand what they are holding - statistical MT requires large investment in both training and scientific resources to improve the quality of the output on a continual basis. I imagine, Lionbridge will also have to ramp their sales / marketing teams. Whatever they do, our experience is that they have a lot of work ahead of them.

Comments

In my opinion, machine translation will still need a few decades or even a century or more to be comparable with human output in terms of quality. Or in other words, the amount of time required for the human kind to accumulate the amount of bilingual corpus to feed the machine's statistically generated rules.

By definition, it has to be a bruttal force approach. This is how chess programs are nowadays outplaying human chess players: they simply build bigger moves trees faster.

In chess the goal is very clear even for a machine: the oponent's king has to be mated. The problem is that a translation machine engine does not have any means to tell what the right target for a given sentence is.

This means that, unlike chess, more processing time or power does not imply a better translation, because the machine does not have a meants to know if the right target has been achieved or not.

That's why gist-level automated translation is so fast to achieve, even in real-time as you write with Google Translate: simply because more time to think does not improve the output, because the machine cannot tell if a sentence in the target language is correct or no even if it was given 100 years to think about it, even with the best glossaries money can buy today.

Jan Hein Donner, a Dutch Chessmaster of the 1950's, said about chess, "our game is just too difficult for ordinary intelligent people."

Is this not the conceit of the language industry? That perfection in all things, as measured by the notion of human quality, should be the target?

I see a very different world today in which the "good enough" revolution has begun. To cite Robert Capps, of Wired Magazine, "To some, it looks like the crapification of everything. But it's really an improvement. And businesses need to get used to it, because the Good Enough revolution has only just begun."

Integrated Translation Solutions, which feature MT in a production chain that includes lexicons, rich TM databases and other corpora, human post-editing and QC processes will surely continue to evolve as you note above and will serve our needs to production publication-ready material in business time, better, faster, and cheaper.

However, in a world where response time is measured in milliseconds not minutes, real-time Multilingual Communication represents a wholly different solution, one geared to providing "good enough" content in real time not about winning and crying "Checkmate!"

Journalist and chess author, Dominic Lawson wrote, “Nothing excites jaded Grandmasters more than a theoretical novelty”. I think with the discussion of the next move on the MT board we are there!

Introducing MT to TM tool is not a bright idea. Lower the cost? yes. But it will definitely lower the quality output. Why? Just think how significant can editing process improve translation quality? 10 percent? 20 percent? It just cannot match the quality level that a good human translation can produce from scratch.

Well, anyway, the article that Paula linked seems to provide us the excuse. We at last found a way to sell MT. Instead of spending money trying to achieve near-human translation quality, why don't we "educate" our clients that they actually do not need such quality. Good enough is good.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.