Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information...

41
Introduction to Information Retrieval Lecture 14: Learning to Rank

Transcript of Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information...

Page 1: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

Introduction to

Information Retrieval

Lecture 14: Learning to Rank

Page 2: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

Machine learning for IR ranking?§ Many heuristic or unsupervised methods for ranking

documents in IR§ Cosine similarity, inverse document frequency, proximity,

pivoted document length normalization, Pagerank, …

§ Supervised machine learning classifiers can be used to classify documents using§ Naïve Bayes, Rocchio, kNN, SVMs

§ What about using machine learning to rank the documents displayed in search results?§ Sounds like a good idea§ A.k.a. “machine-learned relevance” or “learning to rank”

Sec. 15.4

Page 3: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

Practical Usage

https://en.wikipedia.org/wiki/Learning_to_rank#Practical_usage_by_search_engines

Page 4: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

• Given a new object o, map it to a feature vector

• Predict the output (class label)

• Binary classification:

• Multi-class classification:

• Learn a classification function:

• Regression:

Overview of Supervised Learning

x = (x1, x2, . . . , xd)>

{0, 1}<latexit sha1_base64="DyltBDebvwWN2SxTz9zscU6V3QI=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBg5REBD0WvXisYGuhCWWz3bRLN5u4OxFK6J/w4kERr/4db/4bt20O2vpg4PHeDDPzwlQKg6777ZRWVtfWN8qbla3tnd296v5B2ySZZrzFEpnoTkgNl0LxFgqUvJNqTuNQ8odwdDP1H564NiJR9zhOeRDTgRKRYBSt1PFz94x4/qRXrbl1dwayTLyC1KBAs1f98vsJy2KukElqTNdzUwxyqlEwyScVPzM8pWxEB7xrqaIxN0E+u3dCTqzSJ1GibSkkM/X3RE5jY8ZxaDtjikOz6E3F/7xuhtFVkAuVZsgVmy+KMkkwIdPnSV9ozlCOLaFMC3srYUOqKUMbUcWG4C2+vEza53XPrXt3F7XGdRFHGY7gGE7Bg0towC00oQUMJDzDK7w5j86L8+58zFtLTjFzCH/gfP4A0BOPJw==</latexit><latexit sha1_base64="DyltBDebvwWN2SxTz9zscU6V3QI=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBg5REBD0WvXisYGuhCWWz3bRLN5u4OxFK6J/w4kERr/4db/4bt20O2vpg4PHeDDPzwlQKg6777ZRWVtfWN8qbla3tnd296v5B2ySZZrzFEpnoTkgNl0LxFgqUvJNqTuNQ8odwdDP1H564NiJR9zhOeRDTgRKRYBSt1PFz94x4/qRXrbl1dwayTLyC1KBAs1f98vsJy2KukElqTNdzUwxyqlEwyScVPzM8pWxEB7xrqaIxN0E+u3dCTqzSJ1GibSkkM/X3RE5jY8ZxaDtjikOz6E3F/7xuhtFVkAuVZsgVmy+KMkkwIdPnSV9ozlCOLaFMC3srYUOqKUMbUcWG4C2+vEza53XPrXt3F7XGdRFHGY7gGE7Bg0towC00oQUMJDzDK7w5j86L8+58zFtLTjFzCH/gfP4A0BOPJw==</latexit><latexit sha1_base64="DyltBDebvwWN2SxTz9zscU6V3QI=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBg5REBD0WvXisYGuhCWWz3bRLN5u4OxFK6J/w4kERr/4db/4bt20O2vpg4PHeDDPzwlQKg6777ZRWVtfWN8qbla3tnd296v5B2ySZZrzFEpnoTkgNl0LxFgqUvJNqTuNQ8odwdDP1H564NiJR9zhOeRDTgRKRYBSt1PFz94x4/qRXrbl1dwayTLyC1KBAs1f98vsJy2KukElqTNdzUwxyqlEwyScVPzM8pWxEB7xrqaIxN0E+u3dCTqzSJ1GibSkkM/X3RE5jY8ZxaDtjikOz6E3F/7xuhtFVkAuVZsgVmy+KMkkwIdPnSV9ozlCOLaFMC3srYUOqKUMbUcWG4C2+vEza53XPrXt3F7XGdRFHGY7gGE7Bg0towC00oQUMJDzDK7w5j86L8+58zFtLTjFzCH/gfP4A0BOPJw==</latexit><latexit sha1_base64="hP+6LrUf2d3tZaldqaQQvEKMXyw=">AAAB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i77+gsTLXTzQrMMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GSS7KDDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RRtRxzzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D77odBu3wMYA6nMMFXEEIN3AHD9CBLghI4BXevYn35n2suqp569LO4I+8zx84xIo4</latexit><latexit sha1_base64="sHcMO474k6xSBjX8lNJpneZAvKM=">AAAB5HicbZBLSwMxFIXv1FetVatbN8EiuJCScWOXghuXFewDOkPJpHfa0ExmTDJCGfon3LhQxN/kzn9j+lho64HAxzkJufdEmRTGUvrtlba2d3b3yvuVg+rh0XHtpNoxaa45tnkqU92LmEEpFLatsBJ7mUaWRBK70eRunnefURuRqkc7zTBM2EiJWHBmndULCnpF/GA2qNVpgy5ENsFfQR1Wag1qX8Ew5XmCynLJjOn7NLNhwbQVXOKsEuQGM8YnbIR9h4olaMJiMe+MXDhnSOJUu6MsWbi/XxQsMWaaRO5mwuzYrGdz87+sn9u4GRZCZblFxZcfxbkkNiXz5clQaORWTh0wroWblfAx04xbV1HFleCvr7wJneuGTxv+A4UynME5XIIPN3AL99CCNnCQ8AJv8O49ea/ex7Kukrfq7RT+yPv8AaTWjc0=</latexit><latexit sha1_base64="sHcMO474k6xSBjX8lNJpneZAvKM=">AAAB5HicbZBLSwMxFIXv1FetVatbN8EiuJCScWOXghuXFewDOkPJpHfa0ExmTDJCGfon3LhQxN/kzn9j+lho64HAxzkJufdEmRTGUvrtlba2d3b3yvuVg+rh0XHtpNoxaa45tnkqU92LmEEpFLatsBJ7mUaWRBK70eRunnefURuRqkc7zTBM2EiJWHBmndULCnpF/GA2qNVpgy5ENsFfQR1Wag1qX8Ew5XmCynLJjOn7NLNhwbQVXOKsEuQGM8YnbIR9h4olaMJiMe+MXDhnSOJUu6MsWbi/XxQsMWaaRO5mwuzYrGdz87+sn9u4GRZCZblFxZcfxbkkNiXz5clQaORWTh0wroWblfAx04xbV1HFleCvr7wJneuGTxv+A4UynME5XIIPN3AL99CCNnCQ8AJv8O49ea/ex7Kukrfq7RT+yPv8AaTWjc0=</latexit><latexit sha1_base64="7SYSf0NJjbmb8N0vaz+PedktBR4=">AAAB73icbVBNSwMxEJ2tX7V+VT16CRbBg5SsFz0WvXisYD+gu5Rsmm1Dk+yaZIWy9E948aCIV/+ON/+NabsHbX0w8Hhvhpl5USq4sRh/e6W19Y3NrfJ2ZWd3b/+genjUNkmmKWvRRCS6GxHDBFesZbkVrJtqRmQkWCca3878zhPThifqwU5SFkoyVDzmlFgndYMcXyA/mParNVzHc6BV4hekBgWa/epXMEhoJpmyVBBjej5ObZgTbTkVbFoJMsNSQsdkyHqOKiKZCfP5vVN05pQBihPtSlk0V39P5EQaM5GR65TEjsyyNxP/83qZja/DnKs0s0zRxaI4E8gmaPY8GnDNqBUTRwjV3N2K6IhoQq2LqOJC8JdfXiXty7qP6/49rjVuijjKcAKncA4+XEED7qAJLaAg4Ble4c179F68d+9j0Vryiplj+APv8wfO048j</latexit><latexit sha1_base64="DyltBDebvwWN2SxTz9zscU6V3QI=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBg5REBD0WvXisYGuhCWWz3bRLN5u4OxFK6J/w4kERr/4db/4bt20O2vpg4PHeDDPzwlQKg6777ZRWVtfWN8qbla3tnd296v5B2ySZZrzFEpnoTkgNl0LxFgqUvJNqTuNQ8odwdDP1H564NiJR9zhOeRDTgRKRYBSt1PFz94x4/qRXrbl1dwayTLyC1KBAs1f98vsJy2KukElqTNdzUwxyqlEwyScVPzM8pWxEB7xrqaIxN0E+u3dCTqzSJ1GibSkkM/X3RE5jY8ZxaDtjikOz6E3F/7xuhtFVkAuVZsgVmy+KMkkwIdPnSV9ozlCOLaFMC3srYUOqKUMbUcWG4C2+vEza53XPrXt3F7XGdRFHGY7gGE7Bg0towC00oQUMJDzDK7w5j86L8+58zFtLTjFzCH/gfP4A0BOPJw==</latexit><latexit sha1_base64="DyltBDebvwWN2SxTz9zscU6V3QI=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBg5REBD0WvXisYGuhCWWz3bRLN5u4OxFK6J/w4kERr/4db/4bt20O2vpg4PHeDDPzwlQKg6777ZRWVtfWN8qbla3tnd296v5B2ySZZrzFEpnoTkgNl0LxFgqUvJNqTuNQ8odwdDP1H564NiJR9zhOeRDTgRKRYBSt1PFz94x4/qRXrbl1dwayTLyC1KBAs1f98vsJy2KukElqTNdzUwxyqlEwyScVPzM8pWxEB7xrqaIxN0E+u3dCTqzSJ1GibSkkM/X3RE5jY8ZxaDtjikOz6E3F/7xuhtFVkAuVZsgVmy+KMkkwIdPnSV9ozlCOLaFMC3srYUOqKUMbUcWG4C2+vEza53XPrXt3F7XGdRFHGY7gGE7Bg0towC00oQUMJDzDK7w5j86L8+58zFtLTjFzCH/gfP4A0BOPJw==</latexit><latexit sha1_base64="DyltBDebvwWN2SxTz9zscU6V3QI=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBg5REBD0WvXisYGuhCWWz3bRLN5u4OxFK6J/w4kERr/4db/4bt20O2vpg4PHeDDPzwlQKg6777ZRWVtfWN8qbla3tnd296v5B2ySZZrzFEpnoTkgNl0LxFgqUvJNqTuNQ8odwdDP1H564NiJR9zhOeRDTgRKRYBSt1PFz94x4/qRXrbl1dwayTLyC1KBAs1f98vsJy2KukElqTNdzUwxyqlEwyScVPzM8pWxEB7xrqaIxN0E+u3dCTqzSJ1GibSkkM/X3RE5jY8ZxaDtjikOz6E3F/7xuhtFVkAuVZsgVmy+KMkkwIdPnSV9ozlCOLaFMC3srYUOqKUMbUcWG4C2+vEza53XPrXt3F7XGdRFHGY7gGE7Bg0towC00oQUMJDzDK7w5j86L8+58zFtLTjFzCH/gfP4A0BOPJw==</latexit><latexit sha1_base64="DyltBDebvwWN2SxTz9zscU6V3QI=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBg5REBD0WvXisYGuhCWWz3bRLN5u4OxFK6J/w4kERr/4db/4bt20O2vpg4PHeDDPzwlQKg6777ZRWVtfWN8qbla3tnd296v5B2ySZZrzFEpnoTkgNl0LxFgqUvJNqTuNQ8odwdDP1H564NiJR9zhOeRDTgRKRYBSt1PFz94x4/qRXrbl1dwayTLyC1KBAs1f98vsJy2KukElqTNdzUwxyqlEwyScVPzM8pWxEB7xrqaIxN0E+u3dCTqzSJ1GibSkkM/X3RE5jY8ZxaDtjikOz6E3F/7xuhtFVkAuVZsgVmy+KMkkwIdPnSV9ozlCOLaFMC3srYUOqKUMbUcWG4C2+vEza53XPrXt3F7XGdRFHGY7gGE7Bg0towC00oQUMJDzDK7w5j86L8+58zFtLTjFzCH/gfP4A0BOPJw==</latexit><latexit sha1_base64="DyltBDebvwWN2SxTz9zscU6V3QI=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBg5REBD0WvXisYGuhCWWz3bRLN5u4OxFK6J/w4kERr/4db/4bt20O2vpg4PHeDDPzwlQKg6777ZRWVtfWN8qbla3tnd296v5B2ySZZrzFEpnoTkgNl0LxFgqUvJNqTuNQ8odwdDP1H564NiJR9zhOeRDTgRKRYBSt1PFz94x4/qRXrbl1dwayTLyC1KBAs1f98vsJy2KukElqTNdzUwxyqlEwyScVPzM8pWxEB7xrqaIxN0E+u3dCTqzSJ1GibSkkM/X3RE5jY8ZxaDtjikOz6E3F/7xuhtFVkAuVZsgVmy+KMkkwIdPnSV9ozlCOLaFMC3srYUOqKUMbUcWG4C2+vEza53XPrXt3F7XGdRFHGY7gGE7Bg0towC00oQUMJDzDK7w5j86L8+58zFtLTjFzCH/gfP4A0BOPJw==</latexit><latexit sha1_base64="hP+6LrUf2d3tZaldqaQQvEKMXyw=">AAAB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i77+gsTLXTzQrMMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GSS7KDDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RRtRxzzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D77odBu3wMYA6nMMFXEEIN3AHD9CBLghI4BXevYn35n2suqp569LO4I+8zx84xIo4</latexit><latexit sha1_base64="sHcMO474k6xSBjX8lNJpneZAvKM=">AAAB5HicbZBLSwMxFIXv1FetVatbN8EiuJCScWOXghuXFewDOkPJpHfa0ExmTDJCGfon3LhQxN/kzn9j+lho64HAxzkJufdEmRTGUvrtlba2d3b3yvuVg+rh0XHtpNoxaa45tnkqU92LmEEpFLatsBJ7mUaWRBK70eRunnefURuRqkc7zTBM2EiJWHBmndULCnpF/GA2qNVpgy5ENsFfQR1Wag1qX8Ew5XmCynLJjOn7NLNhwbQVXOKsEuQGM8YnbIR9h4olaMJiMe+MXDhnSOJUu6MsWbi/XxQsMWaaRO5mwuzYrGdz87+sn9u4GRZCZblFxZcfxbkkNiXz5clQaORWTh0wroWblfAx04xbV1HFleCvr7wJneuGTxv+A4UynME5XIIPN3AL99CCNnCQ8AJv8O49ea/ex7Kukrfq7RT+yPv8AaTWjc0=</latexit><latexit sha1_base64="sHcMO474k6xSBjX8lNJpneZAvKM=">AAAB5HicbZBLSwMxFIXv1FetVatbN8EiuJCScWOXghuXFewDOkPJpHfa0ExmTDJCGfon3LhQxN/kzn9j+lho64HAxzkJufdEmRTGUvrtlba2d3b3yvuVg+rh0XHtpNoxaa45tnkqU92LmEEpFLatsBJ7mUaWRBK70eRunnefURuRqkc7zTBM2EiJWHBmndULCnpF/GA2qNVpgy5ENsFfQR1Wag1qX8Ew5XmCynLJjOn7NLNhwbQVXOKsEuQGM8YnbIR9h4olaMJiMe+MXDhnSOJUu6MsWbi/XxQsMWaaRO5mwuzYrGdz87+sn9u4GRZCZblFxZcfxbkkNiXz5clQaORWTh0wroWblfAx04xbV1HFleCvr7wJneuGTxv+A4UynME5XIIPN3AL99CCNnCQ8AJv8O49ea/ex7Kukrfq7RT+yPv8AaTWjc0=</latexit><latexit sha1_base64="7SYSf0NJjbmb8N0vaz+PedktBR4=">AAAB73icbVBNSwMxEJ2tX7V+VT16CRbBg5SsFz0WvXisYD+gu5Rsmm1Dk+yaZIWy9E948aCIV/+ON/+NabsHbX0w8Hhvhpl5USq4sRh/e6W19Y3NrfJ2ZWd3b/+genjUNkmmKWvRRCS6GxHDBFesZbkVrJtqRmQkWCca3878zhPThifqwU5SFkoyVDzmlFgndYMcXyA/mParNVzHc6BV4hekBgWa/epXMEhoJpmyVBBjej5ObZgTbTkVbFoJMsNSQsdkyHqOKiKZCfP5vVN05pQBihPtSlk0V39P5EQaM5GR65TEjsyyNxP/83qZja/DnKs0s0zRxaI4E8gmaPY8GnDNqBUTRwjV3N2K6IhoQq2LqOJC8JdfXiXty7qP6/49rjVuijjKcAKncA4+XEED7qAJLaAg4Ble4c179F68d+9j0Vryiplj+APv8wfO048j</latexit><latexit sha1_base64="DyltBDebvwWN2SxTz9zscU6V3QI=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBg5REBD0WvXisYGuhCWWz3bRLN5u4OxFK6J/w4kERr/4db/4bt20O2vpg4PHeDDPzwlQKg6777ZRWVtfWN8qbla3tnd296v5B2ySZZrzFEpnoTkgNl0LxFgqUvJNqTuNQ8odwdDP1H564NiJR9zhOeRDTgRKRYBSt1PFz94x4/qRXrbl1dwayTLyC1KBAs1f98vsJy2KukElqTNdzUwxyqlEwyScVPzM8pWxEB7xrqaIxN0E+u3dCTqzSJ1GibSkkM/X3RE5jY8ZxaDtjikOz6E3F/7xuhtFVkAuVZsgVmy+KMkkwIdPnSV9ozlCOLaFMC3srYUOqKUMbUcWG4C2+vEza53XPrXt3F7XGdRFHGY7gGE7Bg0towC00oQUMJDzDK7w5j86L8+58zFtLTjFzCH/gfP4A0BOPJw==</latexit><latexit sha1_base64="DyltBDebvwWN2SxTz9zscU6V3QI=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBg5REBD0WvXisYGuhCWWz3bRLN5u4OxFK6J/w4kERr/4db/4bt20O2vpg4PHeDDPzwlQKg6777ZRWVtfWN8qbla3tnd296v5B2ySZZrzFEpnoTkgNl0LxFgqUvJNqTuNQ8odwdDP1H564NiJR9zhOeRDTgRKRYBSt1PFz94x4/qRXrbl1dwayTLyC1KBAs1f98vsJy2KukElqTNdzUwxyqlEwyScVPzM8pWxEB7xrqaIxN0E+u3dCTqzSJ1GibSkkM/X3RE5jY8ZxaDtjikOz6E3F/7xuhtFVkAuVZsgVmy+KMkkwIdPnSV9ozlCOLaFMC3srYUOqKUMbUcWG4C2+vEza53XPrXt3F7XGdRFHGY7gGE7Bg0towC00oQUMJDzDK7w5j86L8+58zFtLTjFzCH/gfP4A0BOPJw==</latexit><latexit sha1_base64="DyltBDebvwWN2SxTz9zscU6V3QI=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBg5REBD0WvXisYGuhCWWz3bRLN5u4OxFK6J/w4kERr/4db/4bt20O2vpg4PHeDDPzwlQKg6777ZRWVtfWN8qbla3tnd296v5B2ySZZrzFEpnoTkgNl0LxFgqUvJNqTuNQ8odwdDP1H564NiJR9zhOeRDTgRKRYBSt1PFz94x4/qRXrbl1dwayTLyC1KBAs1f98vsJy2KukElqTNdzUwxyqlEwyScVPzM8pWxEB7xrqaIxN0E+u3dCTqzSJ1GibSkkM/X3RE5jY8ZxaDtjikOz6E3F/7xuhtFVkAuVZsgVmy+KMkkwIdPnSV9ozlCOLaFMC3srYUOqKUMbUcWG4C2+vEza53XPrXt3F7XGdRFHGY7gGE7Bg0towC00oQUMJDzDK7w5j86L8+58zFtLTjFzCH/gfP4A0BOPJw==</latexit><latexit sha1_base64="DyltBDebvwWN2SxTz9zscU6V3QI=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBg5REBD0WvXisYGuhCWWz3bRLN5u4OxFK6J/w4kERr/4db/4bt20O2vpg4PHeDDPzwlQKg6777ZRWVtfWN8qbla3tnd296v5B2ySZZrzFEpnoTkgNl0LxFgqUvJNqTuNQ8odwdDP1H564NiJR9zhOeRDTgRKRYBSt1PFz94x4/qRXrbl1dwayTLyC1KBAs1f98vsJy2KukElqTNdzUwxyqlEwyScVPzM8pWxEB7xrqaIxN0E+u3dCTqzSJ1GibSkkM/X3RE5jY8ZxaDtjikOz6E3F/7xuhtFVkAuVZsgVmy+KMkkwIdPnSV9ozlCOLaFMC3srYUOqKUMbUcWG4C2+vEza53XPrXt3F7XGdRFHGY7gGE7Bg0towC00oQUMJDzDK7w5j86L8+58zFtLTjFzCH/gfP4A0BOPJw==</latexit><latexit sha1_base64="DyltBDebvwWN2SxTz9zscU6V3QI=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBg5REBD0WvXisYGuhCWWz3bRLN5u4OxFK6J/w4kERr/4db/4bt20O2vpg4PHeDDPzwlQKg6777ZRWVtfWN8qbla3tnd296v5B2ySZZrzFEpnoTkgNl0LxFgqUvJNqTuNQ8odwdDP1H564NiJR9zhOeRDTgRKRYBSt1PFz94x4/qRXrbl1dwayTLyC1KBAs1f98vsJy2KukElqTNdzUwxyqlEwyScVPzM8pWxEB7xrqaIxN0E+u3dCTqzSJ1GibSkkM/X3RE5jY8ZxaDtjikOz6E3F/7xuhtFVkAuVZsgVmy+KMkkwIdPnSV9ozlCOLaFMC3srYUOqKUMbUcWG4C2+vEza53XPrXt3F7XGdRFHGY7gGE7Bg0towC00oQUMJDzDK7w5j86L8+58zFtLTjFzCH/gfP4A0BOPJw==</latexit><latexit sha1_base64="DyltBDebvwWN2SxTz9zscU6V3QI=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBg5REBD0WvXisYGuhCWWz3bRLN5u4OxFK6J/w4kERr/4db/4bt20O2vpg4PHeDDPzwlQKg6777ZRWVtfWN8qbla3tnd296v5B2ySZZrzFEpnoTkgNl0LxFgqUvJNqTuNQ8odwdDP1H564NiJR9zhOeRDTgRKRYBSt1PFz94x4/qRXrbl1dwayTLyC1KBAs1f98vsJy2KukElqTNdzUwxyqlEwyScVPzM8pWxEB7xrqaIxN0E+u3dCTqzSJ1GibSkkM/X3RE5jY8ZxaDtjikOz6E3F/7xuhtFVkAuVZsgVmy+KMkkwIdPnSV9ozlCOLaFMC3srYUOqKUMbUcWG4C2+vEza53XPrXt3F7XGdRFHGY7gGE7Bg0towC00oQUMJDzDK7w5j86L8+58zFtLTjFzCH/gfP4A0BOPJw==</latexit>

Sometimes,

Page 5: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

§ Text categorization:

§ Input object: a document = a sequence of words§ Input features : = word frequencies

§ freq(democrats)=2, freq(basketball)= 0, …§ = [1, 2, 0, …]T

§ Class label: ‘Politics’ vs ‘Sport’:

Doc: Months of campaigning and weeks of round-the-clock efforts in Iowa all came down to a final push Sunday, …

Topic:Politics

Sport

ß looks familiar?

Example

Page 6: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

How to Learn?• How to find f()?

• Typically simplify it to finding a parameterized function f(x; 𝚹) within a chosen Function Family.

• e.g., linear functions

• Which one to choose• The one that performs the “best” on a set of training

examples• Use a Loss Function to define the ‘badness’ of a function

• It’s a function of 𝚹. è find 𝚹 that minimizes its value

f(x;✓) = �(✓>x)<latexit sha1_base64="C133fpkh1IXddzzKWosfNqaY/DY=">AAACmnicfZHbattAEIbX6ilVD3Hay/RiqQk4wRgpLbRQCsHpRUMoTSFODJFrRuuRvGRXErurYCP8Dn2a3rav0bfpyFahbqADi74d/TPM/hMXSloXBL9a3p279+4/2HroP3r85Ol2e+fZhc1LI3AocpWbUQwWlcxw6KRTOCoMgo4VXsbXx/X/yxs0VubZuVsUONaQZjKRAhylJu2DpBvN5+94NHAzdLDP3/PIylRDt8l8jVxecNLsT9qdoB+sgt+GsIEOa+JsstMaRdNclBozJxRYexUGhRtXYJwUCpd+VFosQFxDileEGWi042r1qCXfo8yUJ7mhkzm+yv5dUYG2GtysR99aYWtwM92L9ZJvtnbJ23Els6J0mIl156RU3OW8toRPpUHh1IIAhJE0HBczMCAcGbfRyRIgXXtTTMjwGyrLDY2MYBfrC3Xf4/UywHBQKcYGfD/6gPR+g59o3M8FGiDhQRWBSbXMluRHGvVq+p8Q5n+ERD7tIvzX+dtwcdgPX/UPv7zuHA2arWyxXfaSdVnI3rAj9pGdsSET7Bv7zn6wn94Lb+CdeKdrqddqap6zjfDOfwMXOc1g</latexit>

Dtrain = {(xi, yi)}ni=1<latexit sha1_base64="05rYApTW1NGYSVfdJHGBWjKFoeY=">AAACp3icfVHbattAEF2rt1S9Je1jX5aaQFKEkdJC+hIIbR76UpJC7RiyrhitR/KS1YXdVbAR+pR+TV+TD8jfZGS7UDfQgWXPzp4zzJxJKq2sC8Pbnvfg4aPHT7ae+s+ev3j5anvn9ciWtZE4lKUuzTgBi1oVOHTKaRxXBiFPNJ4nl1+6//MrNFaVxQ+3qHCSQ1aoVElwlIq3D0UObiZBNydt3AiHc9c4A6poW37ERcP3xHweq4AvYrXPBXHUUdT+JGU/HITL4PdBtAZ9to6zeKc3FtNS1jkWTmqw9iIKKzdpwDglNba+qC1WIC8hwwuCBeRoJ81ywpbvUmbK09LQKRxfZv9WNJDbbpCA7o5hO+BmeZDkLd8s7dJPk0YVVe2wkKvKaa25K3nnD58qg9LpBQGQRlFzXM7AgHTk4kYlSwDpGUwxJfevSFYaahnBLlYPqr7Lu82A4aAzTAz4vjhBmt/gN2r3tEIDRHzfCDBZTq6TH5kIOvQ/Isz/EAn5tIvoX+fvg9HBIPowOPj+sX/8eb2VLfaWvWN7LGKH7Jh9ZWdsyCT7xX6za3bj7Xun3sgbr6heb615wzbCgzsggNPB</latexit>

Ex2Dtrain [`(y, y)] , where y = f(x;✓*)<latexit sha1_base64="DDFExwMGgz5pPuW/4O5uZ0JxNzw=">AAAC7XicfVFdaxNBFJ1dv9r40bR99GUwFNISSlIFBRFKreCLWMG0gUwIdyd3s0NnP5y5WxOW+Rm+ia/+Jp/9I87mA4wFLyx75sy5hzv3RIVWlrrdX0F45+69+w+2thsPHz1+stPc3bu0eWkk9mWuczOIwKJWGfZJkcZBYRDSSONVdP22vr+6QWNVnn2meYGjFKaZipUE8tS46UQKlERx9c6NKzGbcaEyvuAk6Oq8JglnVJEBlTnnhgK1bs87XCRA1dwdjrj4UsKEL2Ud/jVBg9yt7/kbHre972suzihBgrXhkXOHjXGz1T3uLorfBr0VaLFVXYx3g4GY5LJMMSOpwdphr1vQqAJDSmp0DVFaLEBewxSHHmaQoh1Viz05fuCZCY9z47+M+IL9u6OC1NZP7/h/rbA1oCTtRKnjm9YUvxpVKitKwkwuneNSc8p5vWU+UQYl6bkHII3yw3GZgAFJPosNJ+sB+mNngrHP8Ma35caPjGDny4N3P+B1vmA46ClGBhoNcY7+/QY/+HE/FmjAC48qAWaa+pz8PqaiU6P/CWG2FnpUZ9H7d/O3weXJce/58cmnF63Ts1UqW+wpe8barMdeslP2nl2wPpPsd7Ad7AX7YR5+C7+HP5bSMFj17LONCn/+AaGw7f0=</latexit>

Other terms can be added to the loss function (e.g., regularization, prior, etc.)

aka., Risk function

Page 7: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

How to Evaluate?• Evaluate accuracy on another set of test examples

• Testing error: • i.e., the expectation of the 0-1 loss function on Dtrain.

• Training error

• PAC learning theory assumps that both training and testing data are drawn i.i.d. (Identically independently distributed)• Then,

Dtest = {(xi, yi)}mi=1<latexit sha1_base64="PNwI9rcxgHxvBmS6g+GlYl/Z8lU=">AAACp3icfVHbattAEF2rt1S9OeljX5aaQFKEkdJC8hIIbR76UpJC7RgiV4zWI3nJriR2V8FG6FP6NX1tP6B/05HtQt1AB5Y9O3vmMHMmrZS0Lgx/9bx79x88fLTz2H/y9NnzF/3dvbEtayNwJEpVmkkKFpUscOSkUzipDIJOFV6lNx+6/6tbNFaWxRe3rHCqIS9kJgU4SiX941iDmwtQzXmbNLHDhWscWte2/JTHDT+IF4tEBnyZyEMeE0WeRu1X7Sf9QTgMV8HvgmgDBmwTl8lubxLPSlFrLJxQYO11FFZu2oBxUihs/bi2WIG4gRyvCRag0U6b1YQt36fMjGeloVM4vsr+XdGAtt0gAd0dw3bAzXWQ6pZvS7vsZNrIoqodFmKtnNWKu5J3/vCZNCicWhIAYSQ1x8UcDAhHLm4pWQJIz2CGGbl/S2WloZYR7HL9IPV93m0GDAeVY2rA9+NzpPkNfqJ2Lyo0QMQ3TQwm17JoyY88Djr0PyIs/hAJdbuI/nX+LhgfDaO3w6PP7wZn7zdb2WGv2Gt2wCJ2zM7YR3bJRkywb+w7+8F+eofehTf2Jmuq19vUvGRb4cFvccLTbA==</latexit>

1(y, y) =

(0 , y = y;

1 , else.<latexit sha1_base64="qehjWQ8lT80taXBrApdwwD3hjcA=">AAACy3icfVHbitRAEO2JtzXeZvXRl8ZhZZUwJKugIMKiPviyuIKzOzA9DJWeSqbZpBO6O8vGmEc/zM/wC3zVP7AyE8FxwYKmT1WfOlSfistMWReG3wfelavXrt/Yuenfun3n7r3h7v0TW1RG4kQWWWGmMVjMlMaJUy7DaWkQ8jjD0/jsbfd+eo7GqkJ/cnWJ8xxSrRIlwVFpMZyJHNwqTpqo3a8DLlbgmrp9wl9zX8SYKt1IUretH/LHPOA1PfScV0L40booHF64BjOL49YXqJd9z2I4CsfhOvhlEPVgxPo4XuwOpmJZyCpH7WQG1s6isHTzBoxTMkMSryyWIM8gxRlBDTnaebN2oeV7VFnypDB0tOPr6t8dDeS2+2xAd8ewHXCrPIjzlm9Lu+TlvFG6rBxquVFOqoy7gnce8qUyKF1WEwBpFA3H5QoMSEdObylZAkhpsMSENnRObYWhkRFsvUlIfY932wPDIUsxNuD74h3S/w0e0bgfSjRAxKeNAJPmSrfkRyqCDv2PCBd/iIR82kX0r/OXwcnBOHo2Pvj4fHT4pt/KDnvIHrF9FrEX7JC9Z8dswiT7xn6wn+yXd+RZ77P3ZUP1Bn3PA7YV3tffEm7fNg==</latexit>

" = Ex2Dtrain [1(y, y)] , where y = f(x;✓*)<latexit sha1_base64="FAeiBm1FFtsKN6XtV7cgnrvN7Vk=">AAADAXicfVFbixMxFM6Mt7VetquPIgTLancZSmddUBBhWVfwRVzB7haaUs6kZzphM5dNMrVlyJO/wJ/hm/jqL/GX+GqmF7AueCDky8l3vpycLyqk0Kbb/eX5167fuHlr63bjzt1797ebOw/OdF4qjj2ey1z1I9AoRYY9I4zEfqEQ0kjieXTxpr4/n6LSIs8+mXmBwxQmmYgFB+NSo+ZXNgWFhRYyz+hrylIwSRRXb+2oYrMZZSJb5jjI6qROGpyZyigQmbV2sOaHtj0PKEvAVHO7N6TssoQxXZID+jlBhdSu7907cdupv6Ls2CRoYC27b+1eY9RsdTvdRdCrIFyBFlnF6WjH67NxzssUM8MlaD0Iu4UZVqCM4BJtg5UaC+AXMMGBgxmkqIfVYnaW7rrMmMa5ciszdJH9u6KCVNefDNxeM3QNTJIGUWrpprSJXw4rkRWlwYwvleNSUpPTevJ0LBRyI+cOAFfCNUd5Agq4cf5sKGkH0B2DMcbO16kry5VrGUHPlwenvktrz0FRkBOMFDQa7ATd/xW+d+1+KFCBI+5XDNQkdW65eUxYUKP/EWG2JjpUexH+O/mr4OygEz7vHHw8bB0dr1zZIo/IE9ImIXlBjsg7ckp6hJPf3mPvqffM/+J/87/7P5ZU31vVPCQb4f/8A+Ci9j8=</latexit>

E = Ex2Dtest [1(y, y)] , where y = f(x;✓*)<latexit sha1_base64="3DW2liakX3QmD/nKxIem7h52MiY=">AAADAHicfVHbihNBEO0Zb2u8bFYf9aExRLLLEDLrgoIIy7qCL+IKZjeQCaGmU5Npdm7bXbMmDP3iF/gZvomv/ok/4rM9uYhxwYJhTlWfOl1dJywSqanX++m4167fuHlr63bjzt1797ebOw9OdV4qgX2RJ7kahKAxkRn2SVKCg0IhpGGCZ+H56/r87BKVlnn2keYFjlKYZjKSAsiWxs0vQQoUC0iqN4a/4ossjGwyroLZjAcy438Yx3WRcEYVoSZjzHBN901n7vEgBqrmZnfEg4sSJnzJ9finGBVysz6310QdK/6SB0cUI8Fadc+Y3ca42ep1e4vgV4G/Ai22ipPxjjMIJrkoU8xIJKD10O8VNKpAkRQJmkZQaixAnMMUhxZmkKIeVYvVGd62lQmPcmW/jPii+ndHBamuH+nZf83QNaA49cLU8E1pil6MKpkVJWEmlspRmXDKeb14PpEKBSVzC0AoaYfjIgYFgqw9G0raArSpN8HI2npp23JlR0bQ82Vi1du8thwUh2SKoYJGIzhG+36F7+y47wtUYIl7VQBqmsrM2H1MA69G/yPCbE20qPbC/3fzV8Hpftd/1t3/cNA6PFq5ssUesSesw3z2nB2yt+yE9Zlgv5zHTtt56n52v7rf3O9Lquuseh6yjXB//AZSEvWr</latexit>

Pr[E "+ t] � 1� �<latexit sha1_base64="jJLRgTXu3jXt/gjD1Qvc5V5TzA8=">AAACpHicfVHbjtMwEHXDbQmX7cIjLxbVIgSlSpaV4HHFReIBRBF0t1IdqokzSa11LthORRXlR/gaXuEP+BsmbZEoKzGS5ePxmaOZM3GllXVB8KvnXbp85eq1vev+jZu3bu/3D+6c2rI2Eiey1KWZxmBRqwInTjmN08og5LHGs/j8Zfd/tkRjVVl8cqsKoxyyQqVKgqPUvH88NjORg1tI0M3rlguNX7hYgsHKKl0W/DF3ERcZZUP+hIsEtYN5fxCMgnXwiyDcggHbxnh+0JuKpJR1joWTGqydhUHlogaMU1Jj64vaYgXyHDKcESwgRxs16/FafkiZhKeloVM4vs7+XdFAbrsRhnR3DNsBt8iHcd7yXWmXPo8aVVS1w0JulNNac1fyzhyeKIPS6RUBkEZRc1wuwIB0ZOGOkiWA9BwmmJL1SyorDbWMYFebB6kf8m4tYDjoDGMDvi9eIc1v8B21+75CA0R81AgwWa6KlvzIxLBD/yPC1z9EQj7tIvzX+Yvg9GgUPh0dfTgenLzYbmWP3WP32UMWsmfshL1hYzZhkn1j39kP9tN74L31PnqTDdXrbWvusp3wPv8G2EDRcA==</latexit>

Page 8: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html

Page 9: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

Machine Learning Terminologies§ Supervised learning has input labelled data

§ #instances x #attributes matrix/table§ #attributes = #features + 1

§ 1 (usu. the last attribute) is for the class attribute§ Labelled data split into 2 or 3 disjoint subsets

§ Training data§ Training error = 1.0 –

§ Validation/development data§ Testing data

§ Testing/generalization error = 1.0 –

§ We mainly discuss binary classification here§ i.e., #labels = 2

Data Mining: Concepts and Techniques 9

è Build a model

è Select/refine the model

è Evaluate the model

#correctly_classified

#training_instances

#correctly_classified

#testing_instances

Page 10: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

Machine learning for IR ranking§ This “good idea” has been actively researched – and

actively deployed by major web search engines – in the last 7 or so years

§ Why didn’t it happen earlier? § Modern supervised ML has been around for about 20

years…§ Naïve Bayes has been around for about 50 years…

Page 11: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

Machine learning for IR ranking§ There’s some truth to the fact that the IR community

wasn’t very connected to the ML community§ But there were a whole bunch of precursors:

§ Wong, S.K. et al. 1988. Linear structure in information retrieval. SIGIR 1988.

§ Fuhr, N. 1992. Probabilistic methods in information retrieval. Computer Journal.

§ Gey, F. C. 1994. Inferring probability of relevance using the method of logistic regression. SIGIR 1994.

§ Herbrich, R. et al. 2000. Large Margin Rank Boundaries for Ordinal Regression. Advances in Large Margin Classifiers.

Page 12: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

Why weren’t early attempts very successful/influential?

§ Sometimes an idea just takes time to be appreciated…§ Limited training data

§ Especially for real world use (as opposed to writing academic papers), it was very hard to gather test collection queries and relevance judgments that are representative of real user needs and judgments on documents returned§ This has changed, both in academia and industry

§ Poor machine learning techniques§ Insufficient customization to IR problem§ Not enough features for ML to show value

Page 13: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

Why wasn’t ML much needed?§ Traditional ranking functions in IR used a very small

number of features, e.g.,§ Term frequency§ Inverse document frequency§ Document length

§ It was easy to tune weighting coefficients by hand§ And people did§ e.g., BM25 with manual setting of the hyper-parameters

Page 14: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

Why is ML needed now?§ Modern systems – especially on the Web – use a great

number of features:§ Arbitrary useful features – not a single unified model

§ Log frequency of query word in anchor text?§ Query word in color on page?§ # of images on page?§ # of (out) links on page?§ PageRank of page?§ URL length?§ URL contains “~”?§ Page edit recency?§ Page length?

§ The New York Times (2008-06-03) quoted Amit Singhal as saying Google was using over 200 such features.

Page 15: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

Simple example:Using classification for ad hoc IR§ Collect a training corpus of (q, d, r) triples

§ Relevance r is here binary (but may be multiclass, with 3–7 values)

§ Document is represented by a feature vector§ x = (α, ω)⊤, α is cosine similarity, ω is minimum query window size

§ ω is the the shortest text span that includes all query words§ Query term proximity is a very important new weighting factor

§ Train a machine learning model to predict the class r of a document-query pair

Sec. 15.4.1

Page 16: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

Simple example:Using classification for ad hoc IR

§ A linear score function is then Score(d, q) = Score(α, ω) = aα + bω + c

§ And the linear classifier isDecide relevant if Score(d, q) > 𝜏

§ Denote these parameters collectively as θ

Sec. 15.4.1

• We can reduce the number of parameters by 1• θ è w in the later slides

Page 17: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

Simple example:Using classification for ad hoc IR

02 3 4 5

0.05

0.025

cosin

e sc

ore a

Term proximity w

RR

R

R

R RR

RR

RR

N

N

N

N

N

N

NN

N

N

Sec. 15.4.1

Decision surface

Page 18: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

More complex example of using classification for search ranking [Nallapati 2004]

§ We can generalize this to classifier functions over more features

§ We can use methods we have seen previously for learning the linear classifier weights

Page 19: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

An SVM classifier for information retrieval [Nallapati 2004]

§ Let g(r|d,q) = w�f(d,q) + b§ SVM training: want g(r|d,q) ≤ −1 for nonrelevant

documents and g(r|d,q) ≥ 1 for relevant documents§ SVM testing: decide relevant iff g(r|d,q) ≥ 0

§ Features are not word presence features (how would you deal with query words not in your training data?) but scores like the summed (log) tf of all query terms

§ Unbalanced data (which can result in trivial always-say-nonrelevant classifiers) is dealt with by undersampling nonrelevant documents during training (just take some at random) [there are other ways of doing this – cf. Cao et al. later]

Page 20: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

An SVM classifier for information retrieval [Nallapati 2004]

§ Experiments:§ 4 TREC data sets§ Comparisons with Lemur, a state-of-the-art open source IR

engine (Language Model (LM)-based – see IIR ch. 12)§ Linear kernel normally best or almost as good as quadratic

kernel, and so used in reported results§ 6 features, all variants of tf, idf, and tf.idf scores

Page 21: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

An SVM classifier for information retrieval [Nallapati 2004]

Train \ Test Disk 3 Disk 4-5 WT10G (web)

Disk 3 LM 0.1785 0.2503 0.2666

SVM 0.1728 0.2432 0.2750

Disk 4-5 LM 0.1773 0.2516 0.2656

SVM 0.1646 0.2355 0.2675

§ At best the results are about equal to LM§ Actually a little bit below

§ Paper’s advertisement: Easy to add more features§ This is illustrated on a homepage finding task on

WT10G:§ Baseline LM 52% success@10, baseline SVM 58%§ SVM with URL-depth, and in-link features: 78% S@10

Page 22: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

“Learning to rank”§ Classification probably isn’t the right way to think

about approaching ad hoc IR:§ Classification problems: Map to a unordered set of classes§ Regression problems: Map to a real value § Ordinal regression problems: Map to an ordered set of

classes§ A fairly obscure sub-branch of statistics, but what we want here

§ This formulation gives extra power:§ Relations between relevance levels are modeled§ Documents are good versus other documents for query

given collection; not an absolute scale of goodness

Sec. 15.4.2

Page 23: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

“Learning to rank”§ Assume a number of categories C of relevance exist

§ These are totally ordered: c1 < c2 < … < cJ

§ This is the ordinal regression setup§ Assume training data is available consisting of document-

query pairs represented as feature vectors ψi and relevance ranking ci

§ We could do point-wise learning, where we try to map items of a certain relevance rank to a subinterval (e.g, Crammer et al. 2002 PRank)

§ But most work does pair-wise learning, where the input is a pair of results for a query, and the class is the relevance ordering relationship between them

Page 24: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

Pairwise learning: The Ranking SVM [Herbrich et al. 1999, 2000; Joachims et al. 2002]

§ Aim is to classify instance pairs as correctly ranked or incorrectly ranked§ This turns an ordinal regression problem back into a binary

classification problem

§ We want a ranking function f such thatci > ck iff f(ψi) > f(ψk)

§ … or at least one that tries to do this with minimal error

§ Suppose that f is a linear function f(ψi) = w�ψi

Sec. 15.4.2

Page 25: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

The Ranking SVM [Herbrich et al. 1999, 2000; Joachims et al. 2002]

§ Ranking Model: f(ψi)

f (ψi)

Sec. 15.4.2

Page 26: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

The Ranking SVM [Herbrich et al. 1999, 2000; Joachims et al. 2002]

§ Then (combining the two equations on the last slide):ci > ck iff w�(ψi − ψk) > 0

§ Let us then create a new instance space from such pairs:

Φu = Φ(di, dj, q) = ψi − ψk

zu = +1, 0, −1 as. ci >,=,< ck

§ We can build model over just cases for which zu = −1§ From training data S = {Φu}, we train an SVM

Sec. 15.4.2

Q: How do we obtain the final total order once a model is learned?

Page 27: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

Two queries in the original space

Page 28: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

Two queries in the pairwise space

Page 29: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

The Ranking SVM [Herbrich et al. 1999, 2000; Joachims et al. 2002]

§ The SVM learning task is then like other examples that we saw before

§ Find w and ξu ≥ 0 such that§ ½wTw + C Σ ξu is minimized, and§ for all Φu such that zu < 0, w�Φu ≥ 1 − ξu

§ We can just do the negative zu, as ordering is antisymmetric

§ You can again use SVMlight (or other good SVM libraries) to train your model

Sec. 15.4.2

Page 30: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

Aside: The SVM loss function§ The minimization

minw ½wTw + C Σ ξuand for all Φu such that zu < 0, w�Φu ≥ 1 − ξu

§ can be rewritten asminw (1/2C)wTw + Σ ξu

and for all Φu such that zu < 0, ξu ≥ 1 − (w�Φu)

§ Now, taking λ = 1/2C, we can reformulate this as minw Σ [1 − (w�Φu)]+ + λwTw

§ Where []+ is the positive part (0 if a term is negative)

Page 31: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

Aside: The SVM loss function§ The reformulation

minw Σ [1 − (w�Φu)]+ + λwTw§ shows that an SVM can be thought of as having an

empirical “hinge” loss combined with a weight regularizer

Loss

1 w�Φu

Hinge loss Regularizer of‖w‖

Page 32: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

Adapting the Ranking SVM for (successful) Information Retrieval

[Yunbo Cao, Jun Xu, Tie-Yan Liu, Hang Li, Yalou Huang, Hsiao-Wuen Hon SIGIR 2006]

§ A Ranking SVM model already works well§ Using things like vector space model scores as features § As we shall see, it outperforms them in evaluations

§ But it does not model important aspects of practical IR well

§ This paper addresses two customizations of the Ranking SVM to fit an IR utility model

Page 33: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

Problems with the naïve ranking SVM approach

1. Correctly ordering the most relevant documents is crucial to the success of an IR system, while misordering less relevant results matters little§ The ranking SVM considers all ordering violations as the

same

2. Some queries have many (somewhat) relevant documents, and other queries few. If we treat all pairs of results for a query equally, queries with many results will dominate the learning§ But actually queries with few relevant results are at least

as important to do well onBoth can be solved by weighting the loss function appropriately

Page 34: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

MSN Search [now Bing]§ Second experiment with MSN search§ Collection of 2198 queries§ 6 relevance levels rated:

§ Definitive 8990§ Excellent 4403§ Good 3735§ Fair 20463§ Bad 36375§ Detrimental 310

Page 35: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

Experimental Results (MSN search)

Page 36: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

Alternative: Optimizing Rank-Based Measures[Yue et al. SIGIR 2007]

§ If we think that NDCG is a good approximation of the user’s utility function from a result ranking

§ Then, let’s directly optimize this measure§ As opposed to some proxy (weighted pairwise prefs)

§ But, there are problems …§ Objective function no longer decomposes

§ Pairwise prefs decomposed into each pair

§ Objective function is flat or discontinuous

Page 37: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

Discontinuity Example

§ NDCG computed using rank positions§ Ranking via retrieval scores§ Slight changes to model parameters

§ Slight changes to retrieval scores§ No change to ranking§ No change to NDCG

d1 d2 d3

Retrieval Score 0.9 0.6 0.3

Rank 1 2 3Relevance 0 1 0

NDCG = 0.63

NDCG discontinuous w.r.t model parameters!

Page 38: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

Other machine learning methods for learning to rank

§ Of course!§ I’ve only presented the use of SVMs for machine

learned relevance, but other machine learning methods have also been used successfully§ Boosting: RankBoost§ Ordinal Regression loglinear models§ Neural Nets: RankNet§ (Gradient-boosted) Decisision Trees

Page 39: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

The Limitations of Machine Learning§ Everything that we have looked at (and most work in

this area) produces linear models of features by weighting different base features

§ This contrasts with most of the clever ideas of traditional IR, which are nonlinear scalings and combinations of basic measurements§ log term frequency, idf, pivoted length normalization

§ At present, ML is good at weighting features, but not at coming up with nonlinear scalings§ Designing the basic features that give good signals for

ranking remains the domain of human creativity

Page 40: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

Summary§ The idea of learning ranking functions has been around

for about 20 years§ But only recently have ML knowledge, availability of

training datasets, a rich space of features, and massive computation come together to make this a hot research area

§ It’s too early to give a definitive statement on what methods are best in this area … it’s still advancing rapidly

§ But machine learned ranking over many features now easily beats traditional hand-designed ranking functions in comparative evaluations [in part by using the hand-designed functions as features!]

§ There is every reason to think that the importance of machine learning in IR will grow in the future.

Page 41: Introduction to Information Retrievalcs6714/19t3/lect/Lx-L2R.pdfIntroduction to Information Retrieval Machine learning for IR ranking There’s some truth to the fact that the IR community

Introduction to Information Retrieval

Resources§ IIR secs 6.1.2–3 and 15.4§ LETOR benchmark datasets

§ Website with data, links to papers, benchmarks, etc.§ http://research.microsoft.com/users/LETOR/§ Everything you need to start research in this area!

§ Nallapati, R. Discriminative models for information retrieval. SIGIR 2004.

§ Cao, Y., Xu, J. Liu, T.-Y., Li, H., Huang, Y. and Hon, H.-W. Adapting Ranking SVM to Document Retrieval, SIGIR 2006.

§ Y. Yue, T. Finley, F. Radlinski, T. Joachims. A Support Vector Method for Optimizing Average Precision. SIGIR 2007.