My last post took a look at the emerging use and acceptance of machine translation (MT) in the Life Sciences industry. In the following piece, my colleague Matthias Heyn, VP of Life Sciences Solutions, and I will explore how advances in MT standards are opening new opportunities for the industry.
MT Quality Advancements and Neural MT
MT quality has improved dramatically in recent years, driven by the recent wave of research advances in machine learning, increasing volumes of relevant data to train these systems, and improvements in computing power needed to do this.
This combination of resources and events are key drivers for the progress that we see today. The increasing success of deep learning and neural nets, in particular, have created great excitement as successful use cases emerge in many industries, and also benefit a whole class of Natural Language Processing (NLP) applications including MT.
SDL is a pioneer in data-driven machine translation and pioneered the commercial deployment of Statistical Machine Translation (SMT) in the early 2000’s. The research team at SDL has published hundreds of peer-reviewed research papers and has over 45 MT related patents to their credit. While SMT was an improvement over previous rules-based MT systems, the early promise plateaued, and improvements in SMT were slow and small after the initial breakthroughs.
Neural MT changed this and provided a sudden and substantial boost to MT capabilities. Most in the industry consider NMT a revolution in machine learning rather than evolutionary progress in MT. The significant improvements only represent the first wave of improvement, as NMT is still in its nascence.
At SDL our first generation NMT systems improved 27% on average over previous SMT systems. In some languages, the improvement was as much as 100%, based on the automatic metrics used to measure improvement. The second generation of our NMT systems shows an additional 25% improvement over the first generation. This is remarkable in a scientific endeavor that typically sees 5% a year in improvement at most. It is reasonable to expect continued improvements as the research intensity in the NMT field continues and as we at SDL continue to refine and hone our NMT strategy.
The degree of fluency and naturalness of the output, and its ability to produce a large number of sentences that are very fluent and look like they are from the human tongue drives much of the enthusiasm for Neural MT. Human evaluators often consider the early results, with Neural MT output, to be clearly better, even though established MT evaluation metrics such as the BLEU score may only show nominal or no improvements.
The Neural MT revolution has revived the MT industry again with a big leap forward in output quality and has astonished naysayers with the output fluency and quality improvements in “tough” languages like Japanese, German and Russian.
A Breakthrough in Russian MT
An example of SDL’s MT competence was demonstrated recently, when the SDL research team announced a breakthrough with Russian MT, where its new Neural MT system outperformed all industry standards, setting a benchmark for Russian to English machine translation, with 95% of the system’s output labeled as equivalent to human translation quality by professional Russian-English translators.
Additionally, SDLs broad experience in language translation services and enterprise globalization best practices has also enabled them to provide effective MT solutions for many enterprise use cases ranging from eDiscovery, localization productivity improvements, global customer service and support to broad global communication and collaboration use cases that make global enterprises more agile and responsive to improving CX across the globe.
Availability of Enterprise MT Solutions
While the use of MT across public portals is huge, there are several reasons why these generic public systems are not suitable for the enterprise. These include lack of control on critical terminology, lack of data security, lack of integration with enterprise IT infrastructure and lack of deployment flexibility. MT needs to have the following core capabilities to make sense to an enterprise:
- The ability to be tuned and optimized for enterprise content and subject domain.
- The ability to provide assured data security and privacy.
- The integration into enterprise infrastructure that creates, ingests, processes, reviews, analyzes, and generates multilingual data.
- The ability to deploy MT in a variety of required settings including on-premises, private cloud or a shared tenant cloud.
- The availability of expert services to facilitate tailoring requirements and use case optimization.
Life Sciences Perspective
What is clear today, is that the Life Sciences industry can gain business advantage and leverage from the expeditious and informed use of MT. It is worth reviewing this technology to understand this impact.
MT can transform unstructured data, such as free-text clinical notes or transcribed voice-of-the-customer calls, into structured data to provide insights that can improve the health and well-being of patient populations.
As self-service penetrates the Life Sciences industry, the growing volume of new data from around the world can:
- Drive better health outcomes and advance the discovery and commercialization of new drugs
- Improve large-scale population screening to identify trends and at-risk patients.
MT and text mining together will enable the enterprise to process multilingual Real World Evidence (RWE) and generate Real World Data (RWD) to inform all phases of pharmaceutical drug development, commercialization, and drug use in healthcare settings.
Regulatory bodies like the FDA could also utilize additional data related to drug approval trials by expanding to more holistic data during the product approval process – for example, they can also review multilingual internal data from international reports, and multilingual external data from social media that MT can make available for analysis. This could enable much faster processing of drug approvals as more data would be available to support and provide needed background on new drug approval requests.
As the Royal Society states:
“The benefits of machine learning [and MT] in the pharmaceutical sector are potentially significant, from day-to-day operational efficiencies to significant improvements in human health and welfare arising from improving drug discovery or personalising medicine.”