EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care...
Transcript of EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care...
![Page 1: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/1.jpg)
EN. 601.467/667 Introduction to Human Language TechnologyDeep Learning II
Shinji Watanabe
1
![Page 2: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/2.jpg)
Today’s agenda
• Basics of (deep) neural network• How to integrate DNN with HMM• Recurrent neural network
2
![Page 3: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/3.jpg)
Speech recognition pipeline
Feature extraction
“I want to go toJohns Hopkins campus”
Acoustic modeling
(HMM)Lexicon Language
modeling
G OW T UW
“go to”“go two”“go too”“goes to”“goes two”“goes too”
G OW Z T UW
p(O|L)<latexit sha1_base64="cMYLn3i1AeIom9oOH+cAMLvYguU=">AAACZnicbVHLSgMxFE3Hd31WERdugkXQTZmKoO4KbgTFB1gVnCJ30rTG5jEkd9Qy9h/c6p/5Cf6FaR1EWy8EDufcw0lO4kQKh2H4UQjGxicmp6ZnirNz8wuLS6XlK2dSy3idGWnsTQyOS6F5HQVKfpNYDiqW/DruHPb160dunTD6ErsJbyhoa9ESDNBTV8nW2cvJ9t1SOayEg6GjoJqDMsnn/K5UOI6ahqWKa2QSnLuthgk2MrAomOS9YpQ6ngDrQJvfeqhBcdfIBtft0U3PNGnLWH800gH725GBcq6rYr+pAO/dsNYn/9VUKlFY8zSUj639RiZ0kiLX7Du+lUqKhvYboU1hOUPZ9QCYFf4FlN2DBYa+t2Ix0vyJGaVAN7MIbFvBcy+L+uEmySKraM5FUiiBrjdqEHrU4Lkfg6+/Olz2KKjvVA4q4cVuuXaa/8M0WScbZItUyR6pkSNyTuqEkQfySt7Ie+EzWAxWg7Xv1aCQe1bInwnoF1b0vVA=</latexit><latexit sha1_base64="cMYLn3i1AeIom9oOH+cAMLvYguU=">AAACZnicbVHLSgMxFE3Hd31WERdugkXQTZmKoO4KbgTFB1gVnCJ30rTG5jEkd9Qy9h/c6p/5Cf6FaR1EWy8EDufcw0lO4kQKh2H4UQjGxicmp6ZnirNz8wuLS6XlK2dSy3idGWnsTQyOS6F5HQVKfpNYDiqW/DruHPb160dunTD6ErsJbyhoa9ESDNBTV8nW2cvJ9t1SOayEg6GjoJqDMsnn/K5UOI6ahqWKa2QSnLuthgk2MrAomOS9YpQ6ngDrQJvfeqhBcdfIBtft0U3PNGnLWH800gH725GBcq6rYr+pAO/dsNYn/9VUKlFY8zSUj639RiZ0kiLX7Du+lUqKhvYboU1hOUPZ9QCYFf4FlN2DBYa+t2Ix0vyJGaVAN7MIbFvBcy+L+uEmySKraM5FUiiBrjdqEHrU4Lkfg6+/Olz2KKjvVA4q4cVuuXaa/8M0WScbZItUyR6pkSNyTuqEkQfySt7Ie+EzWAxWg7Xv1aCQe1bInwnoF1b0vVA=</latexit><latexit sha1_base64="cMYLn3i1AeIom9oOH+cAMLvYguU=">AAACZnicbVHLSgMxFE3Hd31WERdugkXQTZmKoO4KbgTFB1gVnCJ30rTG5jEkd9Qy9h/c6p/5Cf6FaR1EWy8EDufcw0lO4kQKh2H4UQjGxicmp6ZnirNz8wuLS6XlK2dSy3idGWnsTQyOS6F5HQVKfpNYDiqW/DruHPb160dunTD6ErsJbyhoa9ESDNBTV8nW2cvJ9t1SOayEg6GjoJqDMsnn/K5UOI6ahqWKa2QSnLuthgk2MrAomOS9YpQ6ngDrQJvfeqhBcdfIBtft0U3PNGnLWH800gH725GBcq6rYr+pAO/dsNYn/9VUKlFY8zSUj639RiZ0kiLX7Du+lUqKhvYboU1hOUPZ9QCYFf4FlN2DBYa+t2Ix0vyJGaVAN7MIbFvBcy+L+uEmySKraM5FUiiBrjdqEHrU4Lkfg6+/Olz2KKjvVA4q4cVuuXaa/8M0WScbZItUyR6pkSNyTuqEkQfySt7Ie+EzWAxWg7Xv1aCQe1bInwnoF1b0vVA=</latexit>
p(L|W )<latexit sha1_base64="qYK2KgpnsCiH8bB8m2PrFisTtYs=">AAACZnicbVHLSgMxFE3Hd31WERdugkXQTZmKoO4EN4IiCtYKTpE7aVpj8xiSO9Yy9h/c6p/5Cf6FaR1EWy8EDufcw0lO4kQKh2H4UQgmJqemZ2bnivMLi0vLK6XVG2dSy3iNGWnsbQyOS6F5DQVKfptYDiqWvB53TgZ6/YlbJ4y+xl7CGwraWrQEA/TUTbJz/lLfvV8ph5VwOHQcVHNQJvlc3pcKZ1HTsFRxjUyCc3fVMMFGBhYFk7xfjFLHE2AdaPM7DzUo7hrZ8Lp9uu2ZJm0Z649GOmR/OzJQzvVU7DcV4IMb1Qbkv5pKJQpruiP52DpsZEInKXLNvuNbqaRo6KAR2hSWM5Q9D4BZ4V9A2QNYYOh7KxYjzbvMKAW6mUVg2wqe+1k0CDdJFllFcy6SQgl0/XGD0OMGz/0YfP3V0bLHQW2vclQJr/bLxxf5P8ySTbJFdkiVHJBjckouSY0w8kheyRt5L3wGy8F6sPG9GhRyzxr5MwH9AmbevVg=</latexit><latexit sha1_base64="qYK2KgpnsCiH8bB8m2PrFisTtYs=">AAACZnicbVHLSgMxFE3Hd31WERdugkXQTZmKoO4EN4IiCtYKTpE7aVpj8xiSO9Yy9h/c6p/5Cf6FaR1EWy8EDufcw0lO4kQKh2H4UQgmJqemZ2bnivMLi0vLK6XVG2dSy3iNGWnsbQyOS6F5DQVKfptYDiqWvB53TgZ6/YlbJ4y+xl7CGwraWrQEA/TUTbJz/lLfvV8ph5VwOHQcVHNQJvlc3pcKZ1HTsFRxjUyCc3fVMMFGBhYFk7xfjFLHE2AdaPM7DzUo7hrZ8Lp9uu2ZJm0Z649GOmR/OzJQzvVU7DcV4IMb1Qbkv5pKJQpruiP52DpsZEInKXLNvuNbqaRo6KAR2hSWM5Q9D4BZ4V9A2QNYYOh7KxYjzbvMKAW6mUVg2wqe+1k0CDdJFllFcy6SQgl0/XGD0OMGz/0YfP3V0bLHQW2vclQJr/bLxxf5P8ySTbJFdkiVHJBjckouSY0w8kheyRt5L3wGy8F6sPG9GhRyzxr5MwH9AmbevVg=</latexit><latexit sha1_base64="qYK2KgpnsCiH8bB8m2PrFisTtYs=">AAACZnicbVHLSgMxFE3Hd31WERdugkXQTZmKoO4EN4IiCtYKTpE7aVpj8xiSO9Yy9h/c6p/5Cf6FaR1EWy8EDufcw0lO4kQKh2H4UQgmJqemZ2bnivMLi0vLK6XVG2dSy3iNGWnsbQyOS6F5DQVKfptYDiqWvB53TgZ6/YlbJ4y+xl7CGwraWrQEA/TUTbJz/lLfvV8ph5VwOHQcVHNQJvlc3pcKZ1HTsFRxjUyCc3fVMMFGBhYFk7xfjFLHE2AdaPM7DzUo7hrZ8Lp9uu2ZJm0Z649GOmR/OzJQzvVU7DcV4IMb1Qbkv5pKJQpruiP52DpsZEInKXLNvuNbqaRo6KAR2hSWM5Q9D4BZ4V9A2QNYYOh7KxYjzbvMKAW6mUVg2wqe+1k0CDdJFllFcy6SQgl0/XGD0OMGz/0YfP3V0bLHQW2vclQJr/bLxxf5P8ySTbJFdkiVHJBjckouSY0w8kheyRt5L3wGy8F6sPG9GhRyzxr5MwH9AmbevVg=</latexit>
p(W )<latexit sha1_base64="TXBIpHjV3+jB6mD/iHvO6m+VfPA=">AAACZHicbVHbSgMxFEzXW623qvgkSLAI+lK2IqhvBV8EQRSsFdwiZ9O0DeayJGfVsvQXfNVf8wv8DbN1EW09EBhmzjDJJE6kcBiGH6VgZnZufqG8WFlaXlldq65v3DqTWsZbzEhj72JwXArNWyhQ8rvEclCx5O348SzX20/cOmH0DQ4T3lHQ16InGGBOJfvtg4dqLayH46HToFGAGinm6mG9dBF1DUsV18gkOHffCBPsZGBRMMlHlSh1PAH2CH1+76EGxV0nG192RPc806U9Y/3RSMfsb0cGyrmhiv2mAhy4SS0n/9VUKlFY8zyRj72TTiZ0kiLX7Du+l0qKhuZ90K6wnKEcegDMCv8CygZggaFvrVKJNH9mRinQ3SwC21fwMsqiPNwkWWQVLbhICiXQjaYNQk8bPPdj8PU3JsueBq3D+mk9vD6qNS+LfyiTbbJL9kmDHJMmOSdXpEUYGZBX8kbeS5/BSrAZbH2vBqXCs0n+TLDzBYtEvHw=</latexit><latexit sha1_base64="TXBIpHjV3+jB6mD/iHvO6m+VfPA=">AAACZHicbVHbSgMxFEzXW623qvgkSLAI+lK2IqhvBV8EQRSsFdwiZ9O0DeayJGfVsvQXfNVf8wv8DbN1EW09EBhmzjDJJE6kcBiGH6VgZnZufqG8WFlaXlldq65v3DqTWsZbzEhj72JwXArNWyhQ8rvEclCx5O348SzX20/cOmH0DQ4T3lHQ16InGGBOJfvtg4dqLayH46HToFGAGinm6mG9dBF1DUsV18gkOHffCBPsZGBRMMlHlSh1PAH2CH1+76EGxV0nG192RPc806U9Y/3RSMfsb0cGyrmhiv2mAhy4SS0n/9VUKlFY8zyRj72TTiZ0kiLX7Du+l0qKhuZ90K6wnKEcegDMCv8CygZggaFvrVKJNH9mRinQ3SwC21fwMsqiPNwkWWQVLbhICiXQjaYNQk8bPPdj8PU3JsueBq3D+mk9vD6qNS+LfyiTbbJL9kmDHJMmOSdXpEUYGZBX8kbeS5/BSrAZbH2vBqXCs0n+TLDzBYtEvHw=</latexit><latexit sha1_base64="TXBIpHjV3+jB6mD/iHvO6m+VfPA=">AAACZHicbVHbSgMxFEzXW623qvgkSLAI+lK2IqhvBV8EQRSsFdwiZ9O0DeayJGfVsvQXfNVf8wv8DbN1EW09EBhmzjDJJE6kcBiGH6VgZnZufqG8WFlaXlldq65v3DqTWsZbzEhj72JwXArNWyhQ8rvEclCx5O348SzX20/cOmH0DQ4T3lHQ16InGGBOJfvtg4dqLayH46HToFGAGinm6mG9dBF1DUsV18gkOHffCBPsZGBRMMlHlSh1PAH2CH1+76EGxV0nG192RPc806U9Y/3RSMfsb0cGyrmhiv2mAhy4SS0n/9VUKlFY8zyRj72TTiZ0kiLX7Du+l0qKhuZ90K6wnKEcegDMCv8CygZggaFvrVKJNH9mRinQ3SwC21fwMsqiPNwkWWQVLbhICiXQjaYNQk8bPPdj8PU3JsueBq3D+mk9vD6qNS+LfyiTbbJL9kmDHJMmOSdXpEUYGZBX8kbeS5/BSrAZbH2vBqXCs0n+TLDzBYtEvHw=</latexit><latexit sha1_base64="UCVdQAUobHOL6CCgCuMDHI9qnb8=">AAACUnicbVHLSgMxFE3HVx2rtms3wSK4KlM36k5wI7ipYG2hU8qdTNqG5jEkd6xl6A+49evc+iWmD0RbLwQO5+RwknOTTAqHUfRZCnZ29/YPyofhUSU8PjmtVl6cyS3jbWaksd0EHJdC8zYKlLybWQ4qkbyTTO4XeueVWyeMfsZZxvsKRloMBQP0VGtQrUeNaDl0GzTXoE7WM6iVHuPUsFxxjUyCc71mlGG/AIuCST4P49zxDNgERrznoQbFXb9YvnNOLzyT0qGx/mikS/a3owDl3Ewl/qYCHLtNbUH+q6lcorBmupGPw5t+IXSWI9dsFT/MJUVDF1XQVFjOUM48AGaF/wFlY7DA0BcWhrHmU2aUAp0WMdiRgrd5ES/CTVbEVtE1F0uhBLr5tkHobYPnfgy+/eZm19ugfdW4bURPESmTM3JOLkmTXJM78kBapE0YSck7+Sh9BeXgZLWkoLTeVo38maD6DWK+uXQ=</latexit><latexit sha1_base64="O3q582xEhxBOkT56OfOWdZ/scaU=">AAACWXicbVHLSgMxFL0d3/VVFVdugiLopkzdqDvBjeBGwVrBKXInTdtgHkNyRy1Df8Gt/ppf4G+YqUW09ULgcE4OJzk3zZT0FMcflWhmdm5+YXGpuryyurZe21i59TZ3XDS5VdbdpeiFkkY0SZISd5kTqFMlWunjeam3noTz0pobGmSirbFnZFdypJLKDlqHD7W9uB6Phk2DxhjswXiuHjYql0nH8lwLQ1yh9/eNOKN2gY4kV2JYTXIvMuSP2BP3ARrUwreL0WOHbD8wHda1LhxDbMT+dhSovR/oNNzUSH0/qZXkv5rOFUlnnyfyqXvSLqTJchKGf8d3c8XIsrIP1pFOcFKDAJA7GX7AeB8dcgqtVauJEc/cao2mUyToehpfhkVShtusSJxmYy5RUkvyw2mDNNOGwP0YQv2NybKnQfOoflqPr2NYhB3YhQNowDGcwQVcQRM49OEV3uC98hmtRlvfe4oq44Vtwp+Jtr8AZ0S7oA==</latexit><latexit sha1_base64="O3q582xEhxBOkT56OfOWdZ/scaU=">AAACWXicbVHLSgMxFL0d3/VVFVdugiLopkzdqDvBjeBGwVrBKXInTdtgHkNyRy1Df8Gt/ppf4G+YqUW09ULgcE4OJzk3zZT0FMcflWhmdm5+YXGpuryyurZe21i59TZ3XDS5VdbdpeiFkkY0SZISd5kTqFMlWunjeam3noTz0pobGmSirbFnZFdypJLKDlqHD7W9uB6Phk2DxhjswXiuHjYql0nH8lwLQ1yh9/eNOKN2gY4kV2JYTXIvMuSP2BP3ARrUwreL0WOHbD8wHda1LhxDbMT+dhSovR/oNNzUSH0/qZXkv5rOFUlnnyfyqXvSLqTJchKGf8d3c8XIsrIP1pFOcFKDAJA7GX7AeB8dcgqtVauJEc/cao2mUyToehpfhkVShtusSJxmYy5RUkvyw2mDNNOGwP0YQv2NybKnQfOoflqPr2NYhB3YhQNowDGcwQVcQRM49OEV3uC98hmtRlvfe4oq44Vtwp+Jtr8AZ0S7oA==</latexit><latexit sha1_base64="9aYdZ0Xohvj92Q9L2Vaw21RnMm4=">AAACZHicbVHLSgMxFE3Hd7U+cSVIsAh1U6Zu1J3gRhBEwVrBKeVOmraheQzJHbUM/QW3+mt+gb9hpg6irRcCh3Pu4SQncSKFwzD8KAVz8wuLS8sr5dW1yvrG5tb2vTOpZbzJjDT2IQbHpdC8iQIlf0gsBxVL3oqHF7neeuLWCaPvcJTwtoK+Fj3BAHMqqbWOOpvVsB5Ohs6CRgGqpJibzlbpKuoaliqukUlw7rERJtjOwKJgko/LUep4AmwIff7ooQbFXTubXHZMDz3TpT1j/dFIJ+xvRwbKuZGK/aYCHLhpLSf/1VQqUVjzPJWPvdN2JnSSItfsO76XSoqG5n3QrrCcoRx5AMwK/wLKBmCBoW+tXI40f2ZGKdDdLALbV/AyzqI83CRZZBUtuEgKJdCNZw1Czxo892Pw9Temy54FzeP6WT28Davn18U/LJM9ckBqpEFOyDm5JDekSRgZkFfyRt5Ln0El2Al2v1eDUuHZIX8m2P8CigS8eA==</latexit><latexit sha1_base64="TXBIpHjV3+jB6mD/iHvO6m+VfPA=">AAACZHicbVHbSgMxFEzXW623qvgkSLAI+lK2IqhvBV8EQRSsFdwiZ9O0DeayJGfVsvQXfNVf8wv8DbN1EW09EBhmzjDJJE6kcBiGH6VgZnZufqG8WFlaXlldq65v3DqTWsZbzEhj72JwXArNWyhQ8rvEclCx5O348SzX20/cOmH0DQ4T3lHQ16InGGBOJfvtg4dqLayH46HToFGAGinm6mG9dBF1DUsV18gkOHffCBPsZGBRMMlHlSh1PAH2CH1+76EGxV0nG192RPc806U9Y/3RSMfsb0cGyrmhiv2mAhy4SS0n/9VUKlFY8zyRj72TTiZ0kiLX7Du+l0qKhuZ90K6wnKEcegDMCv8CygZggaFvrVKJNH9mRinQ3SwC21fwMsqiPNwkWWQVLbhICiXQjaYNQk8bPPdj8PU3JsueBq3D+mk9vD6qNS+LfyiTbbJL9kmDHJMmOSdXpEUYGZBX8kbeS5/BSrAZbH2vBqXCs0n+TLDzBYtEvHw=</latexit><latexit sha1_base64="TXBIpHjV3+jB6mD/iHvO6m+VfPA=">AAACZHicbVHbSgMxFEzXW623qvgkSLAI+lK2IqhvBV8EQRSsFdwiZ9O0DeayJGfVsvQXfNVf8wv8DbN1EW09EBhmzjDJJE6kcBiGH6VgZnZufqG8WFlaXlldq65v3DqTWsZbzEhj72JwXArNWyhQ8rvEclCx5O348SzX20/cOmH0DQ4T3lHQ16InGGBOJfvtg4dqLayH46HToFGAGinm6mG9dBF1DUsV18gkOHffCBPsZGBRMMlHlSh1PAH2CH1+76EGxV0nG192RPc806U9Y/3RSMfsb0cGyrmhiv2mAhy4SS0n/9VUKlFY8zyRj72TTiZ0kiLX7Du+l0qKhuZ90K6wnKEcegDMCv8CygZggaFvrVKJNH9mRinQ3SwC21fwMsqiPNwkWWQVLbhICiXQjaYNQk8bPPdj8PU3JsueBq3D+mk9vD6qNS+LfyiTbbJL9kmDHJMmOSdXpEUYGZBX8kbeS5/BSrAZbH2vBqXCs0n+TLDzBYtEvHw=</latexit><latexit sha1_base64="TXBIpHjV3+jB6mD/iHvO6m+VfPA=">AAACZHicbVHbSgMxFEzXW623qvgkSLAI+lK2IqhvBV8EQRSsFdwiZ9O0DeayJGfVsvQXfNVf8wv8DbN1EW09EBhmzjDJJE6kcBiGH6VgZnZufqG8WFlaXlldq65v3DqTWsZbzEhj72JwXArNWyhQ8rvEclCx5O348SzX20/cOmH0DQ4T3lHQ16InGGBOJfvtg4dqLayH46HToFGAGinm6mG9dBF1DUsV18gkOHffCBPsZGBRMMlHlSh1PAH2CH1+76EGxV0nG192RPc806U9Y/3RSMfsb0cGyrmhiv2mAhy4SS0n/9VUKlFY8zyRj72TTiZ0kiLX7Du+l0qKhuZ90K6wnKEcegDMCv8CygZggaFvrVKJNH9mRinQ3SwC21fwMsqiPNwkWWQVLbhICiXQjaYNQk8bPPdj8PU3JsueBq3D+mk9vD6qNS+LfyiTbbJL9kmDHJMmOSdXpEUYGZBX8kbeS5/BSrAZbH2vBqXCs0n+TLDzBYtEvHw=</latexit><latexit sha1_base64="TXBIpHjV3+jB6mD/iHvO6m+VfPA=">AAACZHicbVHbSgMxFEzXW623qvgkSLAI+lK2IqhvBV8EQRSsFdwiZ9O0DeayJGfVsvQXfNVf8wv8DbN1EW09EBhmzjDJJE6kcBiGH6VgZnZufqG8WFlaXlldq65v3DqTWsZbzEhj72JwXArNWyhQ8rvEclCx5O348SzX20/cOmH0DQ4T3lHQ16InGGBOJfvtg4dqLayH46HToFGAGinm6mG9dBF1DUsV18gkOHffCBPsZGBRMMlHlSh1PAH2CH1+76EGxV0nG192RPc806U9Y/3RSMfsb0cGyrmhiv2mAhy4SS0n/9VUKlFY8zyRj72TTiZ0kiLX7Du+l0qKhuZ90K6wnKEcegDMCv8CygZggaFvrVKJNH9mRinQ3SwC21fwMsqiPNwkWWQVLbhICiXQjaYNQk8bPPdj8PU3JsueBq3D+mk9vD6qNS+LfyiTbbJL9kmDHJMmOSdXpEUYGZBX8kbeS5/BSrAZbH2vBqXCs0n+TLDzBYtEvHw=</latexit><latexit sha1_base64="TXBIpHjV3+jB6mD/iHvO6m+VfPA=">AAACZHicbVHbSgMxFEzXW623qvgkSLAI+lK2IqhvBV8EQRSsFdwiZ9O0DeayJGfVsvQXfNVf8wv8DbN1EW09EBhmzjDJJE6kcBiGH6VgZnZufqG8WFlaXlldq65v3DqTWsZbzEhj72JwXArNWyhQ8rvEclCx5O348SzX20/cOmH0DQ4T3lHQ16InGGBOJfvtg4dqLayH46HToFGAGinm6mG9dBF1DUsV18gkOHffCBPsZGBRMMlHlSh1PAH2CH1+76EGxV0nG192RPc806U9Y/3RSMfsb0cGyrmhiv2mAhy4SS0n/9VUKlFY8zyRj72TTiZ0kiLX7Du+l0qKhuZ90K6wnKEcegDMCv8CygZggaFvrVKJNH9mRinQ3SwC21fwMsqiPNwkWWQVLbhICiXQjaYNQk8bPPdj8PU3JsueBq3D+mk9vD6qNS+LfyiTbbJL9kmDHJMmOSdXpEUYGZBX8kbeS5/BSrAZbH2vBqXCs0n+TLDzBYtEvHw=</latexit>
![Page 4: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/4.jpg)
Feed-forward neural network for acoustic model• Basic problem 𝑝(𝑠$|𝐨$)
• 𝑠$: HMM state or phoneme• 𝐨$: speech feature vector• 𝑡: data sample
• Configurations• Input features
• Context expansion• Output class
• Softmax function• Training criterion
• Number of layers• Number of hidden states• Type of non-linear activations
4
Output HMM state
Log mel filterbank + 11 context frames
~7hi
dden
laye
rs,
2048
uni
ts
30 ~ 10,000 units
・・・・・・Input speech features
a i u w N・・・
![Page 5: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/5.jpg)
Input feature
• GMM/HMM formulation• Lot of conditional independence assumption and Markov assumption• Many of our trials are how to break these assumptions
• In GMM, we always have to care about the correlation• Delta, linear discriminant analysis, semi-tied covariance
• In DNN, we don’t have to care J• We can simply concatenate
the left and right contexts,and just throw it!
5
![Page 6: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/6.jpg)
Output
• Phoneme or HMM state ID is used• We need to have a pair data of output and input data at frame 𝑡
• First use the Viterbi alignment to obtain the state sequence
• Then, we get the input and output pair
• Make acoustic model as a multiclass classification problem by predicting the all HMM state ID given the observation
• Not consider any constraint in this stage (e.g., left to right, which is handled by an HMM during recognition)
6
![Page 7: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/7.jpg)
Feed-forward neural networks
• Affine transformation and non-linear activation function (sigmoid function)
• Apply the above transformation L times
• Softmax operation to get the probability distribution
7
![Page 8: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/8.jpg)
Linear operation
• Transforms 𝐷(*+,)-dimensional input to 𝐷(*) output𝑓(𝐡(*+,)) = 𝐖(*)𝐡(*+,) + 𝐛(*)
• 𝐖(*) ∈ ℝ5(6)×5(689): Linear transformation matrix• 𝐛(*) ∈ ℝ5(6): bias vector
• Derivatives•: ∑< =><?<@A>
:A>B= 𝛿(𝑖, 𝑖′)
•:(∑< =><?<@A>)
:=>B<B= 𝛿 𝑖, 𝑖G ℎIG
8
![Page 9: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/9.jpg)
Sigmoid function
• Sigmoid function
• Convert the domain from ℝ to [0, 1]• Elementwise sigmoid function:
• No trainable parameter in general• Derivative
• :N(O):O
= 𝜎(𝑥)(1 − 𝜎(𝑥))
9
![Page 10: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/10.jpg)
Softmax function
• Softmax function
• Convert the domain from ℝS to 0, 1 S (make a multinomial dist. → classification)• Satisfy the sum to one condition, i.e., ∑IT, 𝑝 𝑗 𝐡 = 1• 𝐽 = 2: sigmoid function
• Derivative• For 𝑖 = 𝑗: :X(I|𝐡)
:?>= 𝑝(𝑗|𝐡)(1 − 𝑝 𝑗 𝐡 )
• For 𝑖 ≠ 𝑗: :X(I|𝐡):?>
= −𝑝(𝑖|𝐡) 𝑝 𝑗 𝐡
• Or we can write as :X(I|𝐡):?>
= 𝑝(𝑗|𝐡)(𝛿(𝑖, 𝑗) − 𝑝 𝑖 𝐡 ) :𝛿(𝑖, 𝑗): Kronecker’s delta
10
![Page 11: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/11.jpg)
What functions/operations we can use and cannot use?• Most of elementary functions
• +,−,×,÷, log , exp , sin , cos , tan()• The function/operations that we cannot take a derivative, including
some discrete operation• argmax=𝑝(𝑊|𝑂): Basic ASR operation, but we cannot take a derivative….• Discretization
11
![Page 12: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/12.jpg)
Objective function design
• We usually use the cross entropy as an objective function
• Since the Viterbi sequence is a hard assignment, the summation over states is simplified
12
![Page 13: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/13.jpg)
Other objective functions
• Square error𝐡klm − 𝐡 n
• We could also use p norm, e.g., L1 norm
• Binary cross entropy
• Again this is a special case of the cross entropy when the number of classes is two
13
![Page 14: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/14.jpg)
Building blocks
14
Input: 𝐨$ ∈ ℝ5
Output: 𝑠$ ∈ {1, … , 𝐽}
Linear transformation
Sigmoid activation
Softmax activation
+, −, exp , log , etc.
![Page 15: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/15.jpg)
Building blocks
15
Input: 𝐨$ ∈ ℝ5
Output: 𝑠$ ∈ {1, … , 𝐽}
Linear transformation
Softmax activation
![Page 16: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/16.jpg)
Building blocks
16
Input: 𝐨$ ∈ ℝ5
Output: 𝑠$ ∈ {1, … , 𝐽}
Linear transformation
Sigmoid activation
Softmax activation
Linear transformation
![Page 17: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/17.jpg)
Building blocks
17
Input: 𝐨$ ∈ ℝ5
Output: 𝑠$ ∈ {1, … , 𝐽}
Linear transformation
Sigmoid activation
Softmax activation
+, −, exp , log , etc.
![Page 18: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/18.jpg)
Building blocks
18
Input: 𝐨$ ∈ ℝ5
Output: 𝑠$ ∈ {1, … , 𝐽}
Linear transformation
Sigmoid activation
Softmax activation
Linear transformation
![Page 19: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/19.jpg)
How to optimize?Gradient decent and their variants• Take a derivative and update parameters with this derivative
• Chain rule
• Learning rate ρ
19
![Page 20: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/20.jpg)
Deep neural network: nested function
• Chain rule to get a derivative recursively• Each transformation (Affine, sigmoid, and softmax) has analytical derivatives
and we just combine these derivatives• We can obtain the derivative from the back propagation algorithm
21
Softmax activationLinear transformation Sigmoid activation Linear
transformation
![Page 21: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/21.jpg)
Minibatch processing
• Batch processing• Slow convergence• Effective computation
• Online processing• Fast convergence• Very inefficient computation
• Minibatch processing• Something between batch and online
processing
22
where
Whole data(batch)
minibatch
minibatch
minibatch
Split the whole data into minibatch
Θuvw Θxly,
Θuvw ΘxlyzΘxly, Θxlyn
![Page 22: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/22.jpg)
How to set 𝜌?
Θ(|@,) = Θ(|) − 𝜌 } Δ�k�w(|)
• Stochastic Gradient Decent (SGD)• Use a constant value (hyper-parameter)• Can have some heuristic tuning (e.g., 𝜌 ← 0.5 × 𝜌 when the validation loss started to
be degraded. Then the decay factor becomes another a hyperparameter)• Adam, AdaDelta, RMSProp, etc.
• Usecurrentorpreviousgradientinformationadaptivelyupdate𝜌(Δ�k�w
| , Δ�k�w|+, , … )
• Still has hyperparameters to make a balance between current and previous gradient information
• Choice of an appropriate optimizer and its hyperparameters is critical
23
![Page 23: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/23.jpg)
Today’s agenda
• Basics of (deep) neural network• How to integrate DNN with HMM• Recurrent neural network
24
![Page 24: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/24.jpg)
How to integrate DNN with HMM
• Bottleneck feature• DNN/HMM hybrid
25
![Page 25: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/25.jpg)
Bottleneck feature
• Train DNN, but one layer having a narrow layer
• Use a hidden state vector for GMM/HMM
• Nonlinear feature extraction with discriminative abilities
• Can combine with existing GMM/HMM
26
Output HMM state
Log mel filterbank + 11 context frames
1,000 ~ 10,000 units
・・・・・・Input speech features
a i u w N・・・
Bottleneck layer
![Page 26: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/26.jpg)
DNN/HMM hybrid
• How to make it fit to the HMM framework?• Use the Bayes rule to convert the posterior to the likelihood
• is obtained by the maximum likelihood (unigram count)
• Need a modification in the Viterbi algorithm during recognition
27
![Page 27: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/27.jpg)
Today’s agenda
• Basics of (deep) neural network• How to integrate DNN with HMM• Recurrent neural network
28
![Page 28: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/28.jpg)
Recurrent neural network
• Basic problem• HMM state (or phoneme) or speech feature is a sequence
𝑠,, 𝑠n, … , s�𝐨,, 𝐨n, … , 𝐨$
• It’s better to consider context (e.g., previous input) to predict the probability of 𝑠$
𝑝 𝑠$ 𝐨$ → 𝑝(𝑠$|𝐨,, 𝐨n, … , 𝐨$)
• Recurrent neural network (RNN) can handle such problems
29
![Page 29: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/29.jpg)
Recurrent neural network (Elman type)
• Vanilla RNN: We ignore the bias term for simplicity
30
xt
yt
yt-1𝐲$ = 𝜎 𝐖
𝐲$+,𝐱�
![Page 30: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/30.jpg)
Recurrent neural network (Elman type)
• Vanilla RNN
31
xt
yt
yt-1
x1
y1
x2
y2
x3
y3
x4
y4
x5
y5
x6
y6
y2y1 y3 y4 y5
![Page 31: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/31.jpg)
Recurrent neural network (Elman type)
• Vanilla RNN
32
xt
yt
yt-1
x1
y1
x2
y2
x3
y3
x4
y4
x5
y5
x6
y6
y2y1 y3 y4 y5
Possibly consider long-range effect (but longer weaker)no future context
![Page 32: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/32.jpg)
Recurrent neural network (Elman type)
• Vanilla RNN
33
xt
yt
yt-1
x1
y1
x2
y2
x3
y3
x4
y4
x5
y5
x6
y6
y2y1 y3 y4 y5
We can compute the posterior distribution 𝑝(𝑠$|𝐨,, 𝐨n, … , 𝐨$)
Softmax activation
𝑝(𝑠�|𝑥,, 𝑥n, … , 𝑥�)
![Page 33: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/33.jpg)
Bidirectional RNN
34
x1
y1
x2
y2
x3
y3
x4
y4
x5
y5
x6
y6
y3y2 y4 y5 y6
We can compute the posterior distribution 𝑝(𝑠$|𝐨$, 𝐨$@,, … , )
![Page 34: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/34.jpg)
Bidirectional RNN
35
x1
y1
x2
y2
x3
y3
x4
y4
x5
y5
x6
y6
y3y2 y4 y5 y6
x1
y1
x2
y2
x3
y3
x4
y4
x5
y5
x6
y6
y2y1 y3 y4 y5
We can compute the posterior distribution 𝑝(𝑠$|𝐨,, 𝐨n, … , 𝐨$, 𝐨$@,, … , )
𝐲$ = 𝜎 𝐖 𝐲$+,𝐱�
𝐲$ = 𝜎 𝐖 𝐲$+,𝐱�
![Page 35: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/35.jpg)
Long short-term memory RNN
• Keep two states• Normal recurrent state: yt
• Memory cell: ct
37
xt
yt
LSTMblock
ctct-1
ytyt-1
![Page 36: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/36.jpg)
LSTM block
38
σ
𝐱$
𝐲$+,
𝐜$+,
𝐲$
𝐜$
tanhσ
concat
σ
tanh
LinearLinearLinearLinear
Operations w/otrainable parameters
Operations w/trainable parameters
![Page 37: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/37.jpg)
LSTM block
• Cell state keeps the history information
1. It will be forgotten2. New information from xt will be
added3. The cell information will be
outputted as yt
• ”will be” function is implemented by a gate function [0, 1] through the sigmoid activation
39
σ
"#
$#%&
'#%&
$#
'#
tanhσ
concat
σ
tanh
LinearLinearLinearLinear
![Page 38: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/38.jpg)
tanh and sigmoid activations
• sigmoid• Convert the domain from to • Used as a gating (weight the state vector
(information))
• tanh• Convert the domain from to • Allow negative and positive values
40
σ
𝐡$
𝒛$
![Page 39: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/39.jpg)
LSTM block
• Cell state keeps the history information
1. It will be forgotten2. New information from xt will be
added3. The cell information will be
outputted as yt
41
σ
"#
$#%&
'#%&
$#
'#
tanhσ
concat
σ
tanh
LinearLinearLinearLinear
![Page 40: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/40.jpg)
LSTM block
• Cell state keeps the history information
1. It will be forgotten2. New information from xt will be
added3. The cell information will be
outputted as yt
42
σ
"#
$#%&
'#%&
$#
'#
tanhσ
concat
σ
tanh
LinearLinearLinearLinear
![Page 41: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/41.jpg)
LSTM block
• Cell state keeps the history information
1. It will be forgotten2. New information from xt will be
added3. The cell information will be
outputted as yt
43
σ
"#
$#%&
'#%&
$#
'#
tanhσ
concat
σ
tanh
LinearLinearLinearLinear
![Page 42: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/42.jpg)
LSTM block
• Cell state keeps the history information
1. It will be forgotten2. New information from xt will be
added3. The cell information will be
outputted as yt
44
σ
"#
$#%&
'#%&
$#
'#
tanhσ
concat
σ
tanh
LinearLinearLinearLinear
![Page 43: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/43.jpg)
LSTM block
• Cell state keeps the history information
1. It will be forgotten2. New information from xt will be
added3. The cell information will be
outputted as yt
45
σ
"#
$#%&
'#%&
$#
'#
tanhσ
concat
σ
tanh
LinearLinearLinearLinear
![Page 44: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/44.jpg)
LSTM block summary
• 3 gating functions
• Cell update
• Hidden state update
47
σ
"#
$#%&
'#%&
$#
'#
tanhσ
concat
σ
tanh
LinearLinearLinearLinear
![Page 45: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/45.jpg)
LSTM RNN
• LSTM
48
xt
yt
yt-1
x1
y1
x2
y2
x3
y3
x4
y4
x5
y5
x6
y6
y2y1 y3 y4 y5
Possibly consider to keep the above initial information dependency by 𝐠muk�l� = 𝟏 and 𝐠�x��� = 𝟎 at 𝑡 > 1
![Page 46: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/46.jpg)
RNN can be used for both acoustic and language models • HMM/DNN Acoustic model
𝑝(𝑠$|𝐨,, 𝐨n, … , 𝐨$)• Language model𝑝(𝑤�|𝑤,, 𝑤n, … ,𝑤�+,)
49
w0
w1
w1
w2
w2
w3
w3
w4
w4
w5
w5
w6
y2y1 y3 y4 y5
”I” “want” “to” “go” “to” “my
“<s>” ”I” “want” “to” “go” “to”
o1
s1
o2
s2
o3
s3
o4
s4
o5
s5
o6
s6
y2y1 y3 y4 y5
![Page 47: EN. 601.467/667 Introductionto Human Language Technology ... · •In DNN, we don’t have to care J •We can simply concatenate the left and right contexts, and just throw it! 5.](https://reader034.fdocuments.us/reader034/viewer/2022050501/5f9390bdb42dc363806c48e4/html5/thumbnails/47.jpg)
Summary of today’s talk
• Basics of deep neural network• Input, output, function block, back propagation, optimization
• Recurrent neural network• Now we can handle a sequence (𝑠,, 𝑠n, … ), (𝑤,, 𝑤n,… ), (𝐨,, 𝐨n, … )
• Integrate deep neural network for speech recognition• GMM/HMM à DNN/HMM or RNN/HMM• RNN language model
• Next lecture will extend this sequence handling function to directly model whole speech recognition in an end-to-end manner
50