Post on 24-Feb-2016
description
FLAVIUS Meeting – WP4June 8, 2010
Giurgiu BogdanWong William
Agenda• LW contributions• Keys to successful integration• Complete integration picture• Translation REST API• Trustscore™ and Reporting• REST API Version 2• Customization through dictionaries• Customization through training• FLAVIUS Language Weaver Roadmap• Questions & Answers
Language Weaver’s Contribution
Keys to a Successful Partner Integration1. Ability to integrate with Language Weaver
Machine Translation for development and testing
2. Ability to customize baseline engines with dictionaries
3. Ability to customize baseline engines with training of domain/customer specific vertical system
Complete Picture
REST API
TOD
Reporting Trustscore™
Dictionary
Training
Sample UI for the Translation Engine
Translation REST API• Simple HTTP base communication protocol• Leverage HTTP calls – POST, GET, DELETE• Web 2.0 used by Amazon, Twitter, etc.• Supported text formats: TXT, HTML, TMX, XLIFF• Data is encrypted using SSL (via HTTPS)• Authentication using a custom HTTP scheme
– Two addition headers added to every request• LW_Date – Contains a date/time string based on the
request time• Authorization – Contains a string made up of three
strings (each separated by a colon): “LWA:<userid>:<signature>”
• Unique signature generated using a keyed-HMAC (Hash Message Authentication Code) and a SHA1(Secure Hash Algorithm) digest
Translation Rest API
/v1/lpinfo+
HTTP GET
Language Pair
Non-Blocking Translations
/v1/translation/src.tgt/lpid=<id>+
HTTP POST
/v1/user+
HTTP POSTUser
Blocking Translations
/v1/translation/src.tgt/lpid=<id>+
HTTP POST
/v1/translation/src.tgt/lpid=<id>/<jobid>
+HTTP GET/DELETE
Translation REST API• Blocking Translation Request
– HTTP POST to https://lwaccess.languageweaver.com/v1/translation/[src].[tgt]/lpid=[lpid]/[optional-params]/
• Appropriate small chunks of data (less than 640 bytes)• Mandatory Input Parameters:
– [src] – three letter code for the source language (e.g. “eng” for English)– [tgt] – three letter code for the target language– [lpid] – integer denoting the specific language pair system to be used– “source_text=” – [string] - URL escaped version of the input source (POST
DATA)• Optional Input Parameters:
– input_format=[value] – string declaring the input format. Choose from “html”, “plain”, “xliff”.
– input_encoding=[value] – string defining the input format. Only “utf8” supported
• Sample Calls:– Create Blocking Translation Job for Text, Get Language Pair details
Translation REST API• Non-Blocking Translation Request
– HTTP POST to https://lwaccess.languageweaver.com/v1/translation-async/[src].[tgt]/lpid=[lpid]/[optional-params]/
• Appropriate for large size files• Mandatory /Optional Input Parameters are similar with the Blocking
Translation • Sample calls:
– Create Non-Blocking Translation Job for Text/ URL/ File– Get Language Pair details, Get User Info
– Followed by HTTP GET’s to https://api.languageweaver.com/v1/translation-async/[src].[tgt]/[jobID]/lpid=[lpid]/[optional-params]/
• [jobID] – integer denoting the specific translation submitted with the POST
• Sample calls:– GET Non-Blocking Translation Job for Text/ URL/ File
Translation REST API• Sample code – C# Example
// Step 1: Construct the path. Check to see if the LPID and/or input_format is submitted
string szPath = "/v1/translation/" + szSrcLang + "." + szTgtLang + "/"; if (0 != szLPID.Length)
szPath = szPath + "lpid=" + szLPID + "/"; if (0 != szInputFormat.Length)
szPath = szPath + "input_format=" + szInputFormat + "/"; // Step 2: Construct the URL string szURI = m_szHostName + szPath; System.Console.WriteLine(szURI); // Step 3: Prepare the POST request HttpWebRequest request =
(HttpWebRequest)WebRequest.Create(szURI); PrepareHttpRequestHeader("POST", szPath, ref request);
Translation REST API// Step 4: Attach the POST data szSourceText = "source_text=" + szSourceText; byte[] postDataBytes = Encoding.UTF8.GetBytes(szSourceText); request.Method = "POST"; request.ContentType = "application/x-www-form-urlencoded"; request.ContentLength = postDataBytes.Length; Stream requestStream = request.GetRequestStream(); requestStream.Write(postDataBytes, 0, postDataBytes.Length); requestStream.Close(); // Step 5: Read the response HttpWebResponse response = (HttpWebResponse)request.GetResponse(); StreamReader responseReader = new
StreamReader(response.GetResponseStream(), Encoding.UTF8); string lpInfoResponse = responseReader.ReadToEnd(); // Step 6: Parse the XML document for the translated text XmlDocument xmlDoc = new XmlDocument(); xmlDoc.LoadXml(lpInfoResponse); System.Console.WriteLine(lpInfoResponse); XmlNodeList nodeList = xmlDoc.GetElementsByTagName("translated_text"); szTargetText = nodeList[0].InnerText.Trim();
Translation REST API – Header Generation• Sample code – C# Example
– Generate Header // Step 1: Get the current HTTP date
string szHttpDate = GetHttpDate(); // Step 2: Generate the signature szRequestType = szRequestType.ToUpper(); string szSignature = GenerateSignature(szRequestType,
szHttpDate, szURI); // Step 3: Add the two new headers to the request object request.Headers.Add("LW_Date", szHttpDate); request.Headers.Add("Authorization", "LWA:" + m_szUserID +
":" + szSignature); System.Console.WriteLine(szSignature);
Translation REST API – Header Generation
– Generate SignatureEncoding u8Encoding = new UTF8Encoding();
HMACSHA1 hmacsha1 = new HMACSHA1(u8Encoding.GetBytes(m_szAPIKey));
string szMessage = szRequestType.Trim() + "\n" + szHttpDate.Trim() + "\n" + szURI.Trim();
string szSignature = Convert.ToBase64String(hmacsha1.ComputeHash(u8Encoding.GetBytes(szMessage.ToCharArray())));
return szSignature;
Translation REST API• Sample request – response for Create Non-Blocking Translation
Job for Texte.g. HTTP POST request to https://lwaccess.languageweaver.com/v1/translation-async/eng.fra/lpid=74/<?xml version='1.0' encoding='UTF-8'?><lwresponse> <service_version>v1</service_version> <requested_url>/v1/translation-async/eng.fra/lpid=74/</requested_url> <request_type>POST</request_type> <request_time>Wed Mar 3 14:55:51 2010</request_time> <source_language>eng</source_language> <target_language>fra</target_language> <response_data type='translation-async_post'><retrieval_url>https://lwaccess.languageweaver.com/v1/translation-async/eng.fra/
90079.3bccc5e58d50ce7dcaf950f562ec2303/lpid=74</retrieval_url><job_id>90079</job_id><translation_signature>3bccc5e58d50ce7dcaf950f562ec2303</translation_signature><src>eng</src><tgt>fra</tgt><lpid>74</lpid><input_format>text/plain</input_format><input_encoding></input_encoding><dictionary></dictionary><customizer></customizer><source_text><![CDATA[Hello World]]></source_text><server><version>5.1.2 release ENGFRAU20_5.1.x.0</version></server></response_data></lwresponse>
Trustscore™ and Reporting• Internal LW milestone
– Migration to version 2 of REST API• Reporting:
– Words per minute– Number of documents translated– Average document length – Details about the TrustScore™– Other metrics to be defined
• Trustscore™:– Scored from 1-5– Document level scoring– Segment level scoring not supported
REST API Version 2• New format
– Sample of Create Non-Blocking Translation Job for Text
• https://api.languageweaver.com/v2/language-pair/[lpid]/translation-async/[optional-params]/
• Mandatory and Optional parameters same as v1
• Additional calls/ functionality related to:– Trustscure – Reporting – Dictionary
Customization through Dictionaries• Structure
– One entry per term, one translation per entry– Search & Replace mechanism that applies unconditionally
• Size – Up to 300.000 entries
• Best practice to build one– Using CSV files
• Limitations– No limitations on the content– Recommend use of dictionaries is via phrase replacement
instead of word replacement– Gender is not automatically generated– UTF-8
• Impact on performance– No significant impact
Customization through Training
d
Parallel Aligned Text
Optional: Regression Text
Optional: Test Text
Evaluation
Data:• Fix noisy text• More text• Text alignment• Text segmentation
Product Delivery viaTOD
LW TrainingCompute Cloud
Customization through Training• Structure:
– Train on any language pair specified in the FLAVIUS agreement
– Inputs: TMX parallel segments, optional regression text files, optional test sets for evaluation
– Outputs:• Trained engine• Results of BLEU scored test set• Translated output of regression text files• Metrics from input training corpus
– Evaluate customized engine via TOD deployment
FLAVIUS Language Weaver Roadmap
Questions & Answers
Thank you!Accelerating the way the world communicates