Neural Translation with Pytorch · 2017. 5. 29. · Data source Created by Chris Callison-Burch...

Post on 14-May-2021

2 views 0 download

Transcript of Neural Translation with Pytorch · 2017. 5. 29. · Data source Created by Chris Callison-Burch...

Neural Translation with PytorchGTC 2017

JEREMY HOWARD

@JEREMYPHOWARD

I’m assuming some knowledge of…

Python Jupyter Numpy

Word vectors

RNNs

Some review today

course.fast.ai

https://github.com/jph00/part2

Our destination

https://github.com/jph00/part2

Data source

Created by Chris Callison-Burch

Crawled millions of web pages

Used 'a set of simple heuristics’

• Transform French URLs onto English URLs

• i.e. replacing "fr" with "en" and about 40 other hand-written rules

Assume that these documents are translations of each other

The dataset – just the questions

Tokenizing

Because we are translating at word level, we need to tokenize the text first. There are many tokenizers available, but we found we got best results using these simple heuristics.

Final preprocessing result

Unrolled stacked RNNs for sequences

word 1 input

word 2 input

word 3 input

Input

Hidden

Output

InputHidden

HiddenOutput

HiddenHidden

Equivalent recursive diagram

char n inputRepeat for 1n-1

Initialize to zeros

Repeat for 1n-1

Initialize to zeros

This and following 3 slides thanks to Chris Manning (Stanford)https://simons.berkeley.edu/talks/christopher-manning-2017-3-27

* Equation from: “Grammar as a Foreign Language”

Beam search

What is canada 's population ?

Quelle est la population du Canada ?

QueQuoi

leles

enpour

Neural Translation with PytorchGTC 2017

JEREMY HOWARD

@JEREMYPHOWARD