CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of...

66

Transcript of CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of...

Page 1: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

CBMM the Science and Engineering of IntelligenceThe Center for Brains Minds and Machines (CBMM) is a multi-

institutional NSF Science and Technology Center dedicated to the study of intelligence - how the brain produces intelligent behavior

and how we may be able to replicate intelligence in machines

Publications ~500

Research Institutions ~4

Faculty (CS+BCS+hellip) ~23

Researchers ~100

Educational Institutions 12

Funding 2013-2023 ~$50MMachine LearningComputer Science

Science + Engineering

Cognitive Science

NeuroscienceComputational

We aim to make progress in understanding the greatest of all problems in science mdash the problem of intelligence This means

understanding how the brain makes the mind how the brain works and how to build intelligent machines We believe that the science of intelligence will enable better engineering of

intelligence in the long term

CBMMrsquos focus is the Science and the Engineering of Intelligence

Key recent advances in the engineering of intelligence have their roots in basic research on the brain

The CBMM bet (different from Deep Mind)

understand how the brain works (then) make intelligent machines

The problem of intelligence is the greatest problem in science

EAC- May 2020

CBMM Organizational Chart (future)

DirectorTomaso Poggio

EAC

Managing DirectorKathleen Sullivan

(MIT)

Education CoordinatorEllen Hildreth

(WC)

Education Evaluation

Lizanne DeStefano

(GT)

KT Coordinator

Boris Katz(MIT)

Diversity Coordinator

Mandana Sassanfar

(MIT)

Deputy Director

Gabriel Kreiman

(HU)

Associate Director amp Trainee

CoordinatorMatt Wilson

(MIT)

Research DirectorKenneth

Blum(HU)

AdministrativeAssistant

Technology Director

Module 1VISUAL

STREAMTomaso PoggioShimon Ullman

(MIT)

Module 2BRAIN OS

Gabriel Kreiman(HU)

Module 4TOWARDS SYMBOLSBoris Katz

Shimon Ullman(MIT)

Module 3COGNITIVE

CORENancy Kanwisher

Joshua Tenenbaum(MIT)

Jim DiCarlo

102030405060708090

100110120130140150

Facu

lty

Resea

rch Scie

ntist

Postdoc

s

Grad Stud

ents

Underg

rads

StaffO

ther

Total

Year 1Year 2Year 3Year 4Year 5Year 6(Year 7)

CBMM Participants

EAC

Demis Hassabis DeepMind

Charles Isbell Jr Georgia Tech

Christof Koch Allen Institute

Fei-Fei Li Stanford

Lore McGovern MIBR MIT

Joel Oppenheim NYU

Pietro Perona Caltech

Marc Raibert Boston DynamicsJudith Richter MedinolKobi Richter Medinol

Amnon Shashua Mobileye

David Siegel Two Sigma

Susan Whitehead MIT Corporation

Jim Pallotta The Raptor group

Research Education amp Diversity Partners

Boyden Desimone DiCarlo Kaelbling Kanwisher Katz McDermott Oliva Poggio Roy Sassanfar Saxe Schulz Tegmark Tenenbaum Ullman Wilson Torralba

Blum Gershman Kreiman Livingstone Sompolinsky Spelke

MIT Harvard

Chouika Manaye Rwebangira Salmani

Howard U

Hunter College

Isik

Johns Hopkins U

BrumbergQueens College

Chodorow Epstein Sakas Zeigler Freiwald

Rockefeller U

Stanford UJorquera

Universidad Central Del Caribe (UCC)

McNair Program

University of Central Florida

Goodman

Blaser Ciaramitaro Pomplun Shukla

UMass Boston UPR - Mayaguumlez UPR ndash Riacuteo Piedras

Hildreth Wiest WilmerWellesley College

Santiago Vega-Riveros Garcia-Arraras Maldonado-Vlaar Megret Ordoacutentildeez Ortiz-Zuazaga

Kreiman Livingstone

Harvard Medical School

FinlaysonFlorida International U

Kreiman

Boston Childrenrsquos Hospital

Museum of Science Boston

Google

DeepMind

International and Corporate Partners

IITCingolani

ASTARChuan Poh Lim

Hebrew UWeiss

MPIBuumllthoff

Genoa UVerri Rosasco

WeizmannUllman

Sangwan Lee

IBM HondaMicrosoft

Boston Dynamics

Orcam NVIDIASiemens

Schlumberger Mobileye Intel

Fujitsu

GE

Kaist

Videos - ~950 (May 2014 - April 2020)

(of Youtube subscribers only - 18 of viewers)

Ellen Hildreth

Mandana Sassanfar

Diversity Program

EAC- May 2020

Code Software and Datasets

Therersquos Waldo A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task Thomas Miconi Laura Groomes and Gabriel

Kreiman

Cerebral Cortex 2016

- See more at httpklabtchharvardeduresources

miconietal_visualsearch_2016htmlsthashKmHoBP

skxwHtrTkJdpuf

ObjectNet A new benchmark for object recognition (in prep) Andrei Barbu David Mayo Josh Tenenbaum Boris Katz

Existing object detection benchmarks overstate the performance of machines and understate the performance of humans We are creating a dataset that removes biases and shows that machines are far inferior to humans when detecting objects

Partially Occluded Hands B Myanganbayar C Mata G Dekel B Katz G Ben-Yosef A Barbu

A dataset of RGB images of hands holding objects and interacting with objects Measured human accuracy on reconstructing occluded portions of hands People are extremely good at this task while networks are at near chance-level performance

Summer Course at Woods Hole Our flagship initiative

Brains Minds amp Machines Summer Course An intensive three-week course gives advanced students a ldquodeeprdquo introduction to the problem of intelligence

A self-reproducing community of scholars is being formed ~gt300 applicants ~30 accepted

Sponsored fellowships by GoogleX Hidary Foundation + Fujitsu

Ellen Hildreth

Boris Katz

Gabriel Kreiman

Directors

Lizanne Distefano

Kenny Blum

Kathleen Sullivan

Kris Brewer

EAC May 2020

CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence

bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment

understand how the brain works (then) make intelligent machines

WHY

Our vision and mission

Recent Success Stories in AI are based on RL and DL

DL and RL come from neuroscience

Minskyrsquos SNARC

RL

DL

We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering

Vision for the BMM SummerSchool

A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes

1972-2013

Tuebingen MPI fuer BK (1972-1981)

Werner Reichardtrsquos PhD

Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)

The four directors of the MPI fuer Biologische Kybernetik

23

The beautiful eyes of flies

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Fixation and tracking behavior Reichardtrsquos closed loop flight simulator

26

Fixation and tracking behavior

Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 2: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

We aim to make progress in understanding the greatest of all problems in science mdash the problem of intelligence This means

understanding how the brain makes the mind how the brain works and how to build intelligent machines We believe that the science of intelligence will enable better engineering of

intelligence in the long term

CBMMrsquos focus is the Science and the Engineering of Intelligence

Key recent advances in the engineering of intelligence have their roots in basic research on the brain

The CBMM bet (different from Deep Mind)

understand how the brain works (then) make intelligent machines

The problem of intelligence is the greatest problem in science

EAC- May 2020

CBMM Organizational Chart (future)

DirectorTomaso Poggio

EAC

Managing DirectorKathleen Sullivan

(MIT)

Education CoordinatorEllen Hildreth

(WC)

Education Evaluation

Lizanne DeStefano

(GT)

KT Coordinator

Boris Katz(MIT)

Diversity Coordinator

Mandana Sassanfar

(MIT)

Deputy Director

Gabriel Kreiman

(HU)

Associate Director amp Trainee

CoordinatorMatt Wilson

(MIT)

Research DirectorKenneth

Blum(HU)

AdministrativeAssistant

Technology Director

Module 1VISUAL

STREAMTomaso PoggioShimon Ullman

(MIT)

Module 2BRAIN OS

Gabriel Kreiman(HU)

Module 4TOWARDS SYMBOLSBoris Katz

Shimon Ullman(MIT)

Module 3COGNITIVE

CORENancy Kanwisher

Joshua Tenenbaum(MIT)

Jim DiCarlo

102030405060708090

100110120130140150

Facu

lty

Resea

rch Scie

ntist

Postdoc

s

Grad Stud

ents

Underg

rads

StaffO

ther

Total

Year 1Year 2Year 3Year 4Year 5Year 6(Year 7)

CBMM Participants

EAC

Demis Hassabis DeepMind

Charles Isbell Jr Georgia Tech

Christof Koch Allen Institute

Fei-Fei Li Stanford

Lore McGovern MIBR MIT

Joel Oppenheim NYU

Pietro Perona Caltech

Marc Raibert Boston DynamicsJudith Richter MedinolKobi Richter Medinol

Amnon Shashua Mobileye

David Siegel Two Sigma

Susan Whitehead MIT Corporation

Jim Pallotta The Raptor group

Research Education amp Diversity Partners

Boyden Desimone DiCarlo Kaelbling Kanwisher Katz McDermott Oliva Poggio Roy Sassanfar Saxe Schulz Tegmark Tenenbaum Ullman Wilson Torralba

Blum Gershman Kreiman Livingstone Sompolinsky Spelke

MIT Harvard

Chouika Manaye Rwebangira Salmani

Howard U

Hunter College

Isik

Johns Hopkins U

BrumbergQueens College

Chodorow Epstein Sakas Zeigler Freiwald

Rockefeller U

Stanford UJorquera

Universidad Central Del Caribe (UCC)

McNair Program

University of Central Florida

Goodman

Blaser Ciaramitaro Pomplun Shukla

UMass Boston UPR - Mayaguumlez UPR ndash Riacuteo Piedras

Hildreth Wiest WilmerWellesley College

Santiago Vega-Riveros Garcia-Arraras Maldonado-Vlaar Megret Ordoacutentildeez Ortiz-Zuazaga

Kreiman Livingstone

Harvard Medical School

FinlaysonFlorida International U

Kreiman

Boston Childrenrsquos Hospital

Museum of Science Boston

Google

DeepMind

International and Corporate Partners

IITCingolani

ASTARChuan Poh Lim

Hebrew UWeiss

MPIBuumllthoff

Genoa UVerri Rosasco

WeizmannUllman

Sangwan Lee

IBM HondaMicrosoft

Boston Dynamics

Orcam NVIDIASiemens

Schlumberger Mobileye Intel

Fujitsu

GE

Kaist

Videos - ~950 (May 2014 - April 2020)

(of Youtube subscribers only - 18 of viewers)

Ellen Hildreth

Mandana Sassanfar

Diversity Program

EAC- May 2020

Code Software and Datasets

Therersquos Waldo A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task Thomas Miconi Laura Groomes and Gabriel

Kreiman

Cerebral Cortex 2016

- See more at httpklabtchharvardeduresources

miconietal_visualsearch_2016htmlsthashKmHoBP

skxwHtrTkJdpuf

ObjectNet A new benchmark for object recognition (in prep) Andrei Barbu David Mayo Josh Tenenbaum Boris Katz

Existing object detection benchmarks overstate the performance of machines and understate the performance of humans We are creating a dataset that removes biases and shows that machines are far inferior to humans when detecting objects

Partially Occluded Hands B Myanganbayar C Mata G Dekel B Katz G Ben-Yosef A Barbu

A dataset of RGB images of hands holding objects and interacting with objects Measured human accuracy on reconstructing occluded portions of hands People are extremely good at this task while networks are at near chance-level performance

Summer Course at Woods Hole Our flagship initiative

Brains Minds amp Machines Summer Course An intensive three-week course gives advanced students a ldquodeeprdquo introduction to the problem of intelligence

A self-reproducing community of scholars is being formed ~gt300 applicants ~30 accepted

Sponsored fellowships by GoogleX Hidary Foundation + Fujitsu

Ellen Hildreth

Boris Katz

Gabriel Kreiman

Directors

Lizanne Distefano

Kenny Blum

Kathleen Sullivan

Kris Brewer

EAC May 2020

CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence

bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment

understand how the brain works (then) make intelligent machines

WHY

Our vision and mission

Recent Success Stories in AI are based on RL and DL

DL and RL come from neuroscience

Minskyrsquos SNARC

RL

DL

We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering

Vision for the BMM SummerSchool

A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes

1972-2013

Tuebingen MPI fuer BK (1972-1981)

Werner Reichardtrsquos PhD

Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)

The four directors of the MPI fuer Biologische Kybernetik

23

The beautiful eyes of flies

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Fixation and tracking behavior Reichardtrsquos closed loop flight simulator

26

Fixation and tracking behavior

Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 3: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

The CBMM bet (different from Deep Mind)

understand how the brain works (then) make intelligent machines

The problem of intelligence is the greatest problem in science

EAC- May 2020

CBMM Organizational Chart (future)

DirectorTomaso Poggio

EAC

Managing DirectorKathleen Sullivan

(MIT)

Education CoordinatorEllen Hildreth

(WC)

Education Evaluation

Lizanne DeStefano

(GT)

KT Coordinator

Boris Katz(MIT)

Diversity Coordinator

Mandana Sassanfar

(MIT)

Deputy Director

Gabriel Kreiman

(HU)

Associate Director amp Trainee

CoordinatorMatt Wilson

(MIT)

Research DirectorKenneth

Blum(HU)

AdministrativeAssistant

Technology Director

Module 1VISUAL

STREAMTomaso PoggioShimon Ullman

(MIT)

Module 2BRAIN OS

Gabriel Kreiman(HU)

Module 4TOWARDS SYMBOLSBoris Katz

Shimon Ullman(MIT)

Module 3COGNITIVE

CORENancy Kanwisher

Joshua Tenenbaum(MIT)

Jim DiCarlo

102030405060708090

100110120130140150

Facu

lty

Resea

rch Scie

ntist

Postdoc

s

Grad Stud

ents

Underg

rads

StaffO

ther

Total

Year 1Year 2Year 3Year 4Year 5Year 6(Year 7)

CBMM Participants

EAC

Demis Hassabis DeepMind

Charles Isbell Jr Georgia Tech

Christof Koch Allen Institute

Fei-Fei Li Stanford

Lore McGovern MIBR MIT

Joel Oppenheim NYU

Pietro Perona Caltech

Marc Raibert Boston DynamicsJudith Richter MedinolKobi Richter Medinol

Amnon Shashua Mobileye

David Siegel Two Sigma

Susan Whitehead MIT Corporation

Jim Pallotta The Raptor group

Research Education amp Diversity Partners

Boyden Desimone DiCarlo Kaelbling Kanwisher Katz McDermott Oliva Poggio Roy Sassanfar Saxe Schulz Tegmark Tenenbaum Ullman Wilson Torralba

Blum Gershman Kreiman Livingstone Sompolinsky Spelke

MIT Harvard

Chouika Manaye Rwebangira Salmani

Howard U

Hunter College

Isik

Johns Hopkins U

BrumbergQueens College

Chodorow Epstein Sakas Zeigler Freiwald

Rockefeller U

Stanford UJorquera

Universidad Central Del Caribe (UCC)

McNair Program

University of Central Florida

Goodman

Blaser Ciaramitaro Pomplun Shukla

UMass Boston UPR - Mayaguumlez UPR ndash Riacuteo Piedras

Hildreth Wiest WilmerWellesley College

Santiago Vega-Riveros Garcia-Arraras Maldonado-Vlaar Megret Ordoacutentildeez Ortiz-Zuazaga

Kreiman Livingstone

Harvard Medical School

FinlaysonFlorida International U

Kreiman

Boston Childrenrsquos Hospital

Museum of Science Boston

Google

DeepMind

International and Corporate Partners

IITCingolani

ASTARChuan Poh Lim

Hebrew UWeiss

MPIBuumllthoff

Genoa UVerri Rosasco

WeizmannUllman

Sangwan Lee

IBM HondaMicrosoft

Boston Dynamics

Orcam NVIDIASiemens

Schlumberger Mobileye Intel

Fujitsu

GE

Kaist

Videos - ~950 (May 2014 - April 2020)

(of Youtube subscribers only - 18 of viewers)

Ellen Hildreth

Mandana Sassanfar

Diversity Program

EAC- May 2020

Code Software and Datasets

Therersquos Waldo A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task Thomas Miconi Laura Groomes and Gabriel

Kreiman

Cerebral Cortex 2016

- See more at httpklabtchharvardeduresources

miconietal_visualsearch_2016htmlsthashKmHoBP

skxwHtrTkJdpuf

ObjectNet A new benchmark for object recognition (in prep) Andrei Barbu David Mayo Josh Tenenbaum Boris Katz

Existing object detection benchmarks overstate the performance of machines and understate the performance of humans We are creating a dataset that removes biases and shows that machines are far inferior to humans when detecting objects

Partially Occluded Hands B Myanganbayar C Mata G Dekel B Katz G Ben-Yosef A Barbu

A dataset of RGB images of hands holding objects and interacting with objects Measured human accuracy on reconstructing occluded portions of hands People are extremely good at this task while networks are at near chance-level performance

Summer Course at Woods Hole Our flagship initiative

Brains Minds amp Machines Summer Course An intensive three-week course gives advanced students a ldquodeeprdquo introduction to the problem of intelligence

A self-reproducing community of scholars is being formed ~gt300 applicants ~30 accepted

Sponsored fellowships by GoogleX Hidary Foundation + Fujitsu

Ellen Hildreth

Boris Katz

Gabriel Kreiman

Directors

Lizanne Distefano

Kenny Blum

Kathleen Sullivan

Kris Brewer

EAC May 2020

CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence

bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment

understand how the brain works (then) make intelligent machines

WHY

Our vision and mission

Recent Success Stories in AI are based on RL and DL

DL and RL come from neuroscience

Minskyrsquos SNARC

RL

DL

We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering

Vision for the BMM SummerSchool

A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes

1972-2013

Tuebingen MPI fuer BK (1972-1981)

Werner Reichardtrsquos PhD

Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)

The four directors of the MPI fuer Biologische Kybernetik

23

The beautiful eyes of flies

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Fixation and tracking behavior Reichardtrsquos closed loop flight simulator

26

Fixation and tracking behavior

Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 4: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

EAC- May 2020

CBMM Organizational Chart (future)

DirectorTomaso Poggio

EAC

Managing DirectorKathleen Sullivan

(MIT)

Education CoordinatorEllen Hildreth

(WC)

Education Evaluation

Lizanne DeStefano

(GT)

KT Coordinator

Boris Katz(MIT)

Diversity Coordinator

Mandana Sassanfar

(MIT)

Deputy Director

Gabriel Kreiman

(HU)

Associate Director amp Trainee

CoordinatorMatt Wilson

(MIT)

Research DirectorKenneth

Blum(HU)

AdministrativeAssistant

Technology Director

Module 1VISUAL

STREAMTomaso PoggioShimon Ullman

(MIT)

Module 2BRAIN OS

Gabriel Kreiman(HU)

Module 4TOWARDS SYMBOLSBoris Katz

Shimon Ullman(MIT)

Module 3COGNITIVE

CORENancy Kanwisher

Joshua Tenenbaum(MIT)

Jim DiCarlo

102030405060708090

100110120130140150

Facu

lty

Resea

rch Scie

ntist

Postdoc

s

Grad Stud

ents

Underg

rads

StaffO

ther

Total

Year 1Year 2Year 3Year 4Year 5Year 6(Year 7)

CBMM Participants

EAC

Demis Hassabis DeepMind

Charles Isbell Jr Georgia Tech

Christof Koch Allen Institute

Fei-Fei Li Stanford

Lore McGovern MIBR MIT

Joel Oppenheim NYU

Pietro Perona Caltech

Marc Raibert Boston DynamicsJudith Richter MedinolKobi Richter Medinol

Amnon Shashua Mobileye

David Siegel Two Sigma

Susan Whitehead MIT Corporation

Jim Pallotta The Raptor group

Research Education amp Diversity Partners

Boyden Desimone DiCarlo Kaelbling Kanwisher Katz McDermott Oliva Poggio Roy Sassanfar Saxe Schulz Tegmark Tenenbaum Ullman Wilson Torralba

Blum Gershman Kreiman Livingstone Sompolinsky Spelke

MIT Harvard

Chouika Manaye Rwebangira Salmani

Howard U

Hunter College

Isik

Johns Hopkins U

BrumbergQueens College

Chodorow Epstein Sakas Zeigler Freiwald

Rockefeller U

Stanford UJorquera

Universidad Central Del Caribe (UCC)

McNair Program

University of Central Florida

Goodman

Blaser Ciaramitaro Pomplun Shukla

UMass Boston UPR - Mayaguumlez UPR ndash Riacuteo Piedras

Hildreth Wiest WilmerWellesley College

Santiago Vega-Riveros Garcia-Arraras Maldonado-Vlaar Megret Ordoacutentildeez Ortiz-Zuazaga

Kreiman Livingstone

Harvard Medical School

FinlaysonFlorida International U

Kreiman

Boston Childrenrsquos Hospital

Museum of Science Boston

Google

DeepMind

International and Corporate Partners

IITCingolani

ASTARChuan Poh Lim

Hebrew UWeiss

MPIBuumllthoff

Genoa UVerri Rosasco

WeizmannUllman

Sangwan Lee

IBM HondaMicrosoft

Boston Dynamics

Orcam NVIDIASiemens

Schlumberger Mobileye Intel

Fujitsu

GE

Kaist

Videos - ~950 (May 2014 - April 2020)

(of Youtube subscribers only - 18 of viewers)

Ellen Hildreth

Mandana Sassanfar

Diversity Program

EAC- May 2020

Code Software and Datasets

Therersquos Waldo A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task Thomas Miconi Laura Groomes and Gabriel

Kreiman

Cerebral Cortex 2016

- See more at httpklabtchharvardeduresources

miconietal_visualsearch_2016htmlsthashKmHoBP

skxwHtrTkJdpuf

ObjectNet A new benchmark for object recognition (in prep) Andrei Barbu David Mayo Josh Tenenbaum Boris Katz

Existing object detection benchmarks overstate the performance of machines and understate the performance of humans We are creating a dataset that removes biases and shows that machines are far inferior to humans when detecting objects

Partially Occluded Hands B Myanganbayar C Mata G Dekel B Katz G Ben-Yosef A Barbu

A dataset of RGB images of hands holding objects and interacting with objects Measured human accuracy on reconstructing occluded portions of hands People are extremely good at this task while networks are at near chance-level performance

Summer Course at Woods Hole Our flagship initiative

Brains Minds amp Machines Summer Course An intensive three-week course gives advanced students a ldquodeeprdquo introduction to the problem of intelligence

A self-reproducing community of scholars is being formed ~gt300 applicants ~30 accepted

Sponsored fellowships by GoogleX Hidary Foundation + Fujitsu

Ellen Hildreth

Boris Katz

Gabriel Kreiman

Directors

Lizanne Distefano

Kenny Blum

Kathleen Sullivan

Kris Brewer

EAC May 2020

CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence

bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment

understand how the brain works (then) make intelligent machines

WHY

Our vision and mission

Recent Success Stories in AI are based on RL and DL

DL and RL come from neuroscience

Minskyrsquos SNARC

RL

DL

We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering

Vision for the BMM SummerSchool

A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes

1972-2013

Tuebingen MPI fuer BK (1972-1981)

Werner Reichardtrsquos PhD

Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)

The four directors of the MPI fuer Biologische Kybernetik

23

The beautiful eyes of flies

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Fixation and tracking behavior Reichardtrsquos closed loop flight simulator

26

Fixation and tracking behavior

Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 5: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

102030405060708090

100110120130140150

Facu

lty

Resea

rch Scie

ntist

Postdoc

s

Grad Stud

ents

Underg

rads

StaffO

ther

Total

Year 1Year 2Year 3Year 4Year 5Year 6(Year 7)

CBMM Participants

EAC

Demis Hassabis DeepMind

Charles Isbell Jr Georgia Tech

Christof Koch Allen Institute

Fei-Fei Li Stanford

Lore McGovern MIBR MIT

Joel Oppenheim NYU

Pietro Perona Caltech

Marc Raibert Boston DynamicsJudith Richter MedinolKobi Richter Medinol

Amnon Shashua Mobileye

David Siegel Two Sigma

Susan Whitehead MIT Corporation

Jim Pallotta The Raptor group

Research Education amp Diversity Partners

Boyden Desimone DiCarlo Kaelbling Kanwisher Katz McDermott Oliva Poggio Roy Sassanfar Saxe Schulz Tegmark Tenenbaum Ullman Wilson Torralba

Blum Gershman Kreiman Livingstone Sompolinsky Spelke

MIT Harvard

Chouika Manaye Rwebangira Salmani

Howard U

Hunter College

Isik

Johns Hopkins U

BrumbergQueens College

Chodorow Epstein Sakas Zeigler Freiwald

Rockefeller U

Stanford UJorquera

Universidad Central Del Caribe (UCC)

McNair Program

University of Central Florida

Goodman

Blaser Ciaramitaro Pomplun Shukla

UMass Boston UPR - Mayaguumlez UPR ndash Riacuteo Piedras

Hildreth Wiest WilmerWellesley College

Santiago Vega-Riveros Garcia-Arraras Maldonado-Vlaar Megret Ordoacutentildeez Ortiz-Zuazaga

Kreiman Livingstone

Harvard Medical School

FinlaysonFlorida International U

Kreiman

Boston Childrenrsquos Hospital

Museum of Science Boston

Google

DeepMind

International and Corporate Partners

IITCingolani

ASTARChuan Poh Lim

Hebrew UWeiss

MPIBuumllthoff

Genoa UVerri Rosasco

WeizmannUllman

Sangwan Lee

IBM HondaMicrosoft

Boston Dynamics

Orcam NVIDIASiemens

Schlumberger Mobileye Intel

Fujitsu

GE

Kaist

Videos - ~950 (May 2014 - April 2020)

(of Youtube subscribers only - 18 of viewers)

Ellen Hildreth

Mandana Sassanfar

Diversity Program

EAC- May 2020

Code Software and Datasets

Therersquos Waldo A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task Thomas Miconi Laura Groomes and Gabriel

Kreiman

Cerebral Cortex 2016

- See more at httpklabtchharvardeduresources

miconietal_visualsearch_2016htmlsthashKmHoBP

skxwHtrTkJdpuf

ObjectNet A new benchmark for object recognition (in prep) Andrei Barbu David Mayo Josh Tenenbaum Boris Katz

Existing object detection benchmarks overstate the performance of machines and understate the performance of humans We are creating a dataset that removes biases and shows that machines are far inferior to humans when detecting objects

Partially Occluded Hands B Myanganbayar C Mata G Dekel B Katz G Ben-Yosef A Barbu

A dataset of RGB images of hands holding objects and interacting with objects Measured human accuracy on reconstructing occluded portions of hands People are extremely good at this task while networks are at near chance-level performance

Summer Course at Woods Hole Our flagship initiative

Brains Minds amp Machines Summer Course An intensive three-week course gives advanced students a ldquodeeprdquo introduction to the problem of intelligence

A self-reproducing community of scholars is being formed ~gt300 applicants ~30 accepted

Sponsored fellowships by GoogleX Hidary Foundation + Fujitsu

Ellen Hildreth

Boris Katz

Gabriel Kreiman

Directors

Lizanne Distefano

Kenny Blum

Kathleen Sullivan

Kris Brewer

EAC May 2020

CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence

bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment

understand how the brain works (then) make intelligent machines

WHY

Our vision and mission

Recent Success Stories in AI are based on RL and DL

DL and RL come from neuroscience

Minskyrsquos SNARC

RL

DL

We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering

Vision for the BMM SummerSchool

A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes

1972-2013

Tuebingen MPI fuer BK (1972-1981)

Werner Reichardtrsquos PhD

Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)

The four directors of the MPI fuer Biologische Kybernetik

23

The beautiful eyes of flies

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Fixation and tracking behavior Reichardtrsquos closed loop flight simulator

26

Fixation and tracking behavior

Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 6: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

EAC

Demis Hassabis DeepMind

Charles Isbell Jr Georgia Tech

Christof Koch Allen Institute

Fei-Fei Li Stanford

Lore McGovern MIBR MIT

Joel Oppenheim NYU

Pietro Perona Caltech

Marc Raibert Boston DynamicsJudith Richter MedinolKobi Richter Medinol

Amnon Shashua Mobileye

David Siegel Two Sigma

Susan Whitehead MIT Corporation

Jim Pallotta The Raptor group

Research Education amp Diversity Partners

Boyden Desimone DiCarlo Kaelbling Kanwisher Katz McDermott Oliva Poggio Roy Sassanfar Saxe Schulz Tegmark Tenenbaum Ullman Wilson Torralba

Blum Gershman Kreiman Livingstone Sompolinsky Spelke

MIT Harvard

Chouika Manaye Rwebangira Salmani

Howard U

Hunter College

Isik

Johns Hopkins U

BrumbergQueens College

Chodorow Epstein Sakas Zeigler Freiwald

Rockefeller U

Stanford UJorquera

Universidad Central Del Caribe (UCC)

McNair Program

University of Central Florida

Goodman

Blaser Ciaramitaro Pomplun Shukla

UMass Boston UPR - Mayaguumlez UPR ndash Riacuteo Piedras

Hildreth Wiest WilmerWellesley College

Santiago Vega-Riveros Garcia-Arraras Maldonado-Vlaar Megret Ordoacutentildeez Ortiz-Zuazaga

Kreiman Livingstone

Harvard Medical School

FinlaysonFlorida International U

Kreiman

Boston Childrenrsquos Hospital

Museum of Science Boston

Google

DeepMind

International and Corporate Partners

IITCingolani

ASTARChuan Poh Lim

Hebrew UWeiss

MPIBuumllthoff

Genoa UVerri Rosasco

WeizmannUllman

Sangwan Lee

IBM HondaMicrosoft

Boston Dynamics

Orcam NVIDIASiemens

Schlumberger Mobileye Intel

Fujitsu

GE

Kaist

Videos - ~950 (May 2014 - April 2020)

(of Youtube subscribers only - 18 of viewers)

Ellen Hildreth

Mandana Sassanfar

Diversity Program

EAC- May 2020

Code Software and Datasets

Therersquos Waldo A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task Thomas Miconi Laura Groomes and Gabriel

Kreiman

Cerebral Cortex 2016

- See more at httpklabtchharvardeduresources

miconietal_visualsearch_2016htmlsthashKmHoBP

skxwHtrTkJdpuf

ObjectNet A new benchmark for object recognition (in prep) Andrei Barbu David Mayo Josh Tenenbaum Boris Katz

Existing object detection benchmarks overstate the performance of machines and understate the performance of humans We are creating a dataset that removes biases and shows that machines are far inferior to humans when detecting objects

Partially Occluded Hands B Myanganbayar C Mata G Dekel B Katz G Ben-Yosef A Barbu

A dataset of RGB images of hands holding objects and interacting with objects Measured human accuracy on reconstructing occluded portions of hands People are extremely good at this task while networks are at near chance-level performance

Summer Course at Woods Hole Our flagship initiative

Brains Minds amp Machines Summer Course An intensive three-week course gives advanced students a ldquodeeprdquo introduction to the problem of intelligence

A self-reproducing community of scholars is being formed ~gt300 applicants ~30 accepted

Sponsored fellowships by GoogleX Hidary Foundation + Fujitsu

Ellen Hildreth

Boris Katz

Gabriel Kreiman

Directors

Lizanne Distefano

Kenny Blum

Kathleen Sullivan

Kris Brewer

EAC May 2020

CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence

bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment

understand how the brain works (then) make intelligent machines

WHY

Our vision and mission

Recent Success Stories in AI are based on RL and DL

DL and RL come from neuroscience

Minskyrsquos SNARC

RL

DL

We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering

Vision for the BMM SummerSchool

A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes

1972-2013

Tuebingen MPI fuer BK (1972-1981)

Werner Reichardtrsquos PhD

Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)

The four directors of the MPI fuer Biologische Kybernetik

23

The beautiful eyes of flies

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Fixation and tracking behavior Reichardtrsquos closed loop flight simulator

26

Fixation and tracking behavior

Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 7: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Research Education amp Diversity Partners

Boyden Desimone DiCarlo Kaelbling Kanwisher Katz McDermott Oliva Poggio Roy Sassanfar Saxe Schulz Tegmark Tenenbaum Ullman Wilson Torralba

Blum Gershman Kreiman Livingstone Sompolinsky Spelke

MIT Harvard

Chouika Manaye Rwebangira Salmani

Howard U

Hunter College

Isik

Johns Hopkins U

BrumbergQueens College

Chodorow Epstein Sakas Zeigler Freiwald

Rockefeller U

Stanford UJorquera

Universidad Central Del Caribe (UCC)

McNair Program

University of Central Florida

Goodman

Blaser Ciaramitaro Pomplun Shukla

UMass Boston UPR - Mayaguumlez UPR ndash Riacuteo Piedras

Hildreth Wiest WilmerWellesley College

Santiago Vega-Riveros Garcia-Arraras Maldonado-Vlaar Megret Ordoacutentildeez Ortiz-Zuazaga

Kreiman Livingstone

Harvard Medical School

FinlaysonFlorida International U

Kreiman

Boston Childrenrsquos Hospital

Museum of Science Boston

Google

DeepMind

International and Corporate Partners

IITCingolani

ASTARChuan Poh Lim

Hebrew UWeiss

MPIBuumllthoff

Genoa UVerri Rosasco

WeizmannUllman

Sangwan Lee

IBM HondaMicrosoft

Boston Dynamics

Orcam NVIDIASiemens

Schlumberger Mobileye Intel

Fujitsu

GE

Kaist

Videos - ~950 (May 2014 - April 2020)

(of Youtube subscribers only - 18 of viewers)

Ellen Hildreth

Mandana Sassanfar

Diversity Program

EAC- May 2020

Code Software and Datasets

Therersquos Waldo A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task Thomas Miconi Laura Groomes and Gabriel

Kreiman

Cerebral Cortex 2016

- See more at httpklabtchharvardeduresources

miconietal_visualsearch_2016htmlsthashKmHoBP

skxwHtrTkJdpuf

ObjectNet A new benchmark for object recognition (in prep) Andrei Barbu David Mayo Josh Tenenbaum Boris Katz

Existing object detection benchmarks overstate the performance of machines and understate the performance of humans We are creating a dataset that removes biases and shows that machines are far inferior to humans when detecting objects

Partially Occluded Hands B Myanganbayar C Mata G Dekel B Katz G Ben-Yosef A Barbu

A dataset of RGB images of hands holding objects and interacting with objects Measured human accuracy on reconstructing occluded portions of hands People are extremely good at this task while networks are at near chance-level performance

Summer Course at Woods Hole Our flagship initiative

Brains Minds amp Machines Summer Course An intensive three-week course gives advanced students a ldquodeeprdquo introduction to the problem of intelligence

A self-reproducing community of scholars is being formed ~gt300 applicants ~30 accepted

Sponsored fellowships by GoogleX Hidary Foundation + Fujitsu

Ellen Hildreth

Boris Katz

Gabriel Kreiman

Directors

Lizanne Distefano

Kenny Blum

Kathleen Sullivan

Kris Brewer

EAC May 2020

CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence

bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment

understand how the brain works (then) make intelligent machines

WHY

Our vision and mission

Recent Success Stories in AI are based on RL and DL

DL and RL come from neuroscience

Minskyrsquos SNARC

RL

DL

We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering

Vision for the BMM SummerSchool

A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes

1972-2013

Tuebingen MPI fuer BK (1972-1981)

Werner Reichardtrsquos PhD

Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)

The four directors of the MPI fuer Biologische Kybernetik

23

The beautiful eyes of flies

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Fixation and tracking behavior Reichardtrsquos closed loop flight simulator

26

Fixation and tracking behavior

Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 8: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Google

DeepMind

International and Corporate Partners

IITCingolani

ASTARChuan Poh Lim

Hebrew UWeiss

MPIBuumllthoff

Genoa UVerri Rosasco

WeizmannUllman

Sangwan Lee

IBM HondaMicrosoft

Boston Dynamics

Orcam NVIDIASiemens

Schlumberger Mobileye Intel

Fujitsu

GE

Kaist

Videos - ~950 (May 2014 - April 2020)

(of Youtube subscribers only - 18 of viewers)

Ellen Hildreth

Mandana Sassanfar

Diversity Program

EAC- May 2020

Code Software and Datasets

Therersquos Waldo A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task Thomas Miconi Laura Groomes and Gabriel

Kreiman

Cerebral Cortex 2016

- See more at httpklabtchharvardeduresources

miconietal_visualsearch_2016htmlsthashKmHoBP

skxwHtrTkJdpuf

ObjectNet A new benchmark for object recognition (in prep) Andrei Barbu David Mayo Josh Tenenbaum Boris Katz

Existing object detection benchmarks overstate the performance of machines and understate the performance of humans We are creating a dataset that removes biases and shows that machines are far inferior to humans when detecting objects

Partially Occluded Hands B Myanganbayar C Mata G Dekel B Katz G Ben-Yosef A Barbu

A dataset of RGB images of hands holding objects and interacting with objects Measured human accuracy on reconstructing occluded portions of hands People are extremely good at this task while networks are at near chance-level performance

Summer Course at Woods Hole Our flagship initiative

Brains Minds amp Machines Summer Course An intensive three-week course gives advanced students a ldquodeeprdquo introduction to the problem of intelligence

A self-reproducing community of scholars is being formed ~gt300 applicants ~30 accepted

Sponsored fellowships by GoogleX Hidary Foundation + Fujitsu

Ellen Hildreth

Boris Katz

Gabriel Kreiman

Directors

Lizanne Distefano

Kenny Blum

Kathleen Sullivan

Kris Brewer

EAC May 2020

CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence

bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment

understand how the brain works (then) make intelligent machines

WHY

Our vision and mission

Recent Success Stories in AI are based on RL and DL

DL and RL come from neuroscience

Minskyrsquos SNARC

RL

DL

We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering

Vision for the BMM SummerSchool

A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes

1972-2013

Tuebingen MPI fuer BK (1972-1981)

Werner Reichardtrsquos PhD

Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)

The four directors of the MPI fuer Biologische Kybernetik

23

The beautiful eyes of flies

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Fixation and tracking behavior Reichardtrsquos closed loop flight simulator

26

Fixation and tracking behavior

Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 9: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Videos - ~950 (May 2014 - April 2020)

(of Youtube subscribers only - 18 of viewers)

Ellen Hildreth

Mandana Sassanfar

Diversity Program

EAC- May 2020

Code Software and Datasets

Therersquos Waldo A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task Thomas Miconi Laura Groomes and Gabriel

Kreiman

Cerebral Cortex 2016

- See more at httpklabtchharvardeduresources

miconietal_visualsearch_2016htmlsthashKmHoBP

skxwHtrTkJdpuf

ObjectNet A new benchmark for object recognition (in prep) Andrei Barbu David Mayo Josh Tenenbaum Boris Katz

Existing object detection benchmarks overstate the performance of machines and understate the performance of humans We are creating a dataset that removes biases and shows that machines are far inferior to humans when detecting objects

Partially Occluded Hands B Myanganbayar C Mata G Dekel B Katz G Ben-Yosef A Barbu

A dataset of RGB images of hands holding objects and interacting with objects Measured human accuracy on reconstructing occluded portions of hands People are extremely good at this task while networks are at near chance-level performance

Summer Course at Woods Hole Our flagship initiative

Brains Minds amp Machines Summer Course An intensive three-week course gives advanced students a ldquodeeprdquo introduction to the problem of intelligence

A self-reproducing community of scholars is being formed ~gt300 applicants ~30 accepted

Sponsored fellowships by GoogleX Hidary Foundation + Fujitsu

Ellen Hildreth

Boris Katz

Gabriel Kreiman

Directors

Lizanne Distefano

Kenny Blum

Kathleen Sullivan

Kris Brewer

EAC May 2020

CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence

bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment

understand how the brain works (then) make intelligent machines

WHY

Our vision and mission

Recent Success Stories in AI are based on RL and DL

DL and RL come from neuroscience

Minskyrsquos SNARC

RL

DL

We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering

Vision for the BMM SummerSchool

A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes

1972-2013

Tuebingen MPI fuer BK (1972-1981)

Werner Reichardtrsquos PhD

Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)

The four directors of the MPI fuer Biologische Kybernetik

23

The beautiful eyes of flies

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Fixation and tracking behavior Reichardtrsquos closed loop flight simulator

26

Fixation and tracking behavior

Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 10: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Ellen Hildreth

Mandana Sassanfar

Diversity Program

EAC- May 2020

Code Software and Datasets

Therersquos Waldo A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task Thomas Miconi Laura Groomes and Gabriel

Kreiman

Cerebral Cortex 2016

- See more at httpklabtchharvardeduresources

miconietal_visualsearch_2016htmlsthashKmHoBP

skxwHtrTkJdpuf

ObjectNet A new benchmark for object recognition (in prep) Andrei Barbu David Mayo Josh Tenenbaum Boris Katz

Existing object detection benchmarks overstate the performance of machines and understate the performance of humans We are creating a dataset that removes biases and shows that machines are far inferior to humans when detecting objects

Partially Occluded Hands B Myanganbayar C Mata G Dekel B Katz G Ben-Yosef A Barbu

A dataset of RGB images of hands holding objects and interacting with objects Measured human accuracy on reconstructing occluded portions of hands People are extremely good at this task while networks are at near chance-level performance

Summer Course at Woods Hole Our flagship initiative

Brains Minds amp Machines Summer Course An intensive three-week course gives advanced students a ldquodeeprdquo introduction to the problem of intelligence

A self-reproducing community of scholars is being formed ~gt300 applicants ~30 accepted

Sponsored fellowships by GoogleX Hidary Foundation + Fujitsu

Ellen Hildreth

Boris Katz

Gabriel Kreiman

Directors

Lizanne Distefano

Kenny Blum

Kathleen Sullivan

Kris Brewer

EAC May 2020

CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence

bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment

understand how the brain works (then) make intelligent machines

WHY

Our vision and mission

Recent Success Stories in AI are based on RL and DL

DL and RL come from neuroscience

Minskyrsquos SNARC

RL

DL

We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering

Vision for the BMM SummerSchool

A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes

1972-2013

Tuebingen MPI fuer BK (1972-1981)

Werner Reichardtrsquos PhD

Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)

The four directors of the MPI fuer Biologische Kybernetik

23

The beautiful eyes of flies

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Fixation and tracking behavior Reichardtrsquos closed loop flight simulator

26

Fixation and tracking behavior

Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 11: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

EAC- May 2020

Code Software and Datasets

Therersquos Waldo A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task Thomas Miconi Laura Groomes and Gabriel

Kreiman

Cerebral Cortex 2016

- See more at httpklabtchharvardeduresources

miconietal_visualsearch_2016htmlsthashKmHoBP

skxwHtrTkJdpuf

ObjectNet A new benchmark for object recognition (in prep) Andrei Barbu David Mayo Josh Tenenbaum Boris Katz

Existing object detection benchmarks overstate the performance of machines and understate the performance of humans We are creating a dataset that removes biases and shows that machines are far inferior to humans when detecting objects

Partially Occluded Hands B Myanganbayar C Mata G Dekel B Katz G Ben-Yosef A Barbu

A dataset of RGB images of hands holding objects and interacting with objects Measured human accuracy on reconstructing occluded portions of hands People are extremely good at this task while networks are at near chance-level performance

Summer Course at Woods Hole Our flagship initiative

Brains Minds amp Machines Summer Course An intensive three-week course gives advanced students a ldquodeeprdquo introduction to the problem of intelligence

A self-reproducing community of scholars is being formed ~gt300 applicants ~30 accepted

Sponsored fellowships by GoogleX Hidary Foundation + Fujitsu

Ellen Hildreth

Boris Katz

Gabriel Kreiman

Directors

Lizanne Distefano

Kenny Blum

Kathleen Sullivan

Kris Brewer

EAC May 2020

CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence

bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment

understand how the brain works (then) make intelligent machines

WHY

Our vision and mission

Recent Success Stories in AI are based on RL and DL

DL and RL come from neuroscience

Minskyrsquos SNARC

RL

DL

We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering

Vision for the BMM SummerSchool

A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes

1972-2013

Tuebingen MPI fuer BK (1972-1981)

Werner Reichardtrsquos PhD

Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)

The four directors of the MPI fuer Biologische Kybernetik

23

The beautiful eyes of flies

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Fixation and tracking behavior Reichardtrsquos closed loop flight simulator

26

Fixation and tracking behavior

Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 12: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Summer Course at Woods Hole Our flagship initiative

Brains Minds amp Machines Summer Course An intensive three-week course gives advanced students a ldquodeeprdquo introduction to the problem of intelligence

A self-reproducing community of scholars is being formed ~gt300 applicants ~30 accepted

Sponsored fellowships by GoogleX Hidary Foundation + Fujitsu

Ellen Hildreth

Boris Katz

Gabriel Kreiman

Directors

Lizanne Distefano

Kenny Blum

Kathleen Sullivan

Kris Brewer

EAC May 2020

CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence

bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment

understand how the brain works (then) make intelligent machines

WHY

Our vision and mission

Recent Success Stories in AI are based on RL and DL

DL and RL come from neuroscience

Minskyrsquos SNARC

RL

DL

We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering

Vision for the BMM SummerSchool

A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes

1972-2013

Tuebingen MPI fuer BK (1972-1981)

Werner Reichardtrsquos PhD

Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)

The four directors of the MPI fuer Biologische Kybernetik

23

The beautiful eyes of flies

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Fixation and tracking behavior Reichardtrsquos closed loop flight simulator

26

Fixation and tracking behavior

Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 13: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

EAC May 2020

CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence

bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment

understand how the brain works (then) make intelligent machines

WHY

Our vision and mission

Recent Success Stories in AI are based on RL and DL

DL and RL come from neuroscience

Minskyrsquos SNARC

RL

DL

We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering

Vision for the BMM SummerSchool

A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes

1972-2013

Tuebingen MPI fuer BK (1972-1981)

Werner Reichardtrsquos PhD

Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)

The four directors of the MPI fuer Biologische Kybernetik

23

The beautiful eyes of flies

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Fixation and tracking behavior Reichardtrsquos closed loop flight simulator

26

Fixation and tracking behavior

Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 14: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

understand how the brain works (then) make intelligent machines

WHY

Our vision and mission

Recent Success Stories in AI are based on RL and DL

DL and RL come from neuroscience

Minskyrsquos SNARC

RL

DL

We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering

Vision for the BMM SummerSchool

A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes

1972-2013

Tuebingen MPI fuer BK (1972-1981)

Werner Reichardtrsquos PhD

Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)

The four directors of the MPI fuer Biologische Kybernetik

23

The beautiful eyes of flies

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Fixation and tracking behavior Reichardtrsquos closed loop flight simulator

26

Fixation and tracking behavior

Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 15: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Recent Success Stories in AI are based on RL and DL

DL and RL come from neuroscience

Minskyrsquos SNARC

RL

DL

We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering

Vision for the BMM SummerSchool

A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes

1972-2013

Tuebingen MPI fuer BK (1972-1981)

Werner Reichardtrsquos PhD

Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)

The four directors of the MPI fuer Biologische Kybernetik

23

The beautiful eyes of flies

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Fixation and tracking behavior Reichardtrsquos closed loop flight simulator

26

Fixation and tracking behavior

Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 16: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

DL and RL come from neuroscience

Minskyrsquos SNARC

RL

DL

We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering

Vision for the BMM SummerSchool

A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes

1972-2013

Tuebingen MPI fuer BK (1972-1981)

Werner Reichardtrsquos PhD

Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)

The four directors of the MPI fuer Biologische Kybernetik

23

The beautiful eyes of flies

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Fixation and tracking behavior Reichardtrsquos closed loop flight simulator

26

Fixation and tracking behavior

Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 17: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering

Vision for the BMM SummerSchool

A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes

1972-2013

Tuebingen MPI fuer BK (1972-1981)

Werner Reichardtrsquos PhD

Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)

The four directors of the MPI fuer Biologische Kybernetik

23

The beautiful eyes of flies

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Fixation and tracking behavior Reichardtrsquos closed loop flight simulator

26

Fixation and tracking behavior

Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 18: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes

1972-2013

Tuebingen MPI fuer BK (1972-1981)

Werner Reichardtrsquos PhD

Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)

The four directors of the MPI fuer Biologische Kybernetik

23

The beautiful eyes of flies

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Fixation and tracking behavior Reichardtrsquos closed loop flight simulator

26

Fixation and tracking behavior

Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 19: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Tuebingen MPI fuer BK (1972-1981)

Werner Reichardtrsquos PhD

Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)

The four directors of the MPI fuer Biologische Kybernetik

23

The beautiful eyes of flies

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Fixation and tracking behavior Reichardtrsquos closed loop flight simulator

26

Fixation and tracking behavior

Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 20: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Werner Reichardtrsquos PhD

Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)

The four directors of the MPI fuer Biologische Kybernetik

23

The beautiful eyes of flies

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Fixation and tracking behavior Reichardtrsquos closed loop flight simulator

26

Fixation and tracking behavior

Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 21: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

The four directors of the MPI fuer Biologische Kybernetik

23

The beautiful eyes of flies

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Fixation and tracking behavior Reichardtrsquos closed loop flight simulator

26

Fixation and tracking behavior

Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 22: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

23

The beautiful eyes of flies

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Fixation and tracking behavior Reichardtrsquos closed loop flight simulator

26

Fixation and tracking behavior

Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 23: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Fixation and tracking behavior Reichardtrsquos closed loop flight simulator

26

Fixation and tracking behavior

Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 24: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Fixation and tracking behavior Reichardtrsquos closed loop flight simulator

26

Fixation and tracking behavior

Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 25: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

26

Fixation and tracking behavior

Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 26: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

27

Cognition in flies probabilistic theories then (coming only now to humans)

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 27: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)

most behavioral fly research was done with the Goumltz torque meter

in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375

open question how well does this theory describe fly behavior of natural flight

in 1980 Wehrhan started high-speed film recording of flies chasing each other

single frame analysis 3D stereo reconstruction

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 28: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip

Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 29: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

30

Cognition in flies

Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 30: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Work at 3 levels

bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits

bull Biophysics of computation

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 31: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 32: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Motion algorithm the beetle and the fly

bull The beetle follows the motion

bull Each photoreceptor sees only an alternation of dark and light how is motion computed

bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector

bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz

bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex

bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 33: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Relative motion and figure-ground discrimination the fly

Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 34: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Motion discontinuities and figure-ground discrimination neural circuitry

Towards the neural circuitry Reichardt Poggio Hausen 1983

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 35: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

36

Relative motion

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 36: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003

Two of the neuronshellip

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 37: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Work at 3 levels

bull Fixation and tracking behavior of the fly

bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)

bull Biophysics of computation

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 38: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

39

Biophysics of computation (motion detection)

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 39: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Biophysics of Computation

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

copy Nature Publishing Group1985

_____________________________________ ____________

Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch

Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA

Istituto di Fisica Universita di Genova Genova Italy

Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain

COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision

Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction

A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems

Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only

Examples of early vision processes

bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour

generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing

Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process

A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems

Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6

one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions

The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the

Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ

ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ

oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ

cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ

$3)135 51052+5 4amp-5 5 (5

13

UacuteѱKŏUdԛ ԛ

ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ

CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ

oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 40: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain

A synaptic mechanism possibly underlying directionalselectivity to motion

B y V T O R R E f AND T P O G G IO j

f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany

(Communicated by B B Boycott FR8 - Received 1 February 1978)

A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process

Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina

Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay

Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)

14 L 409 ] Vol 202 B

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 41: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Cooperativeneuralnetworkforstereo

~ 1979 T Poggio and D Marr MPI Tuebingen

)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz

1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg

nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2

sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2

0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2

3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2

6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2

sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2

=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2

U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2

$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2

$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2

$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2

$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2

$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2

$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2

$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2

$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2

$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2

sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2

$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2

sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2

sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2

13

13

4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z

PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2

13

+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz

D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz

0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2

3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2

6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2

sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2

)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2

sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2

sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2

$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2

sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2

sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2

0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2

3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2

6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2

sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2

=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2

Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2

pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz

UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z

Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2

=sup2

Cooperative Computation of Stereo Disparity

D Marr T Poggio

Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287

Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1

Science is currently published by American Association for the Advancement of Science

Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use

Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission

JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg

httpwwwjstororgMon Jan 22 124953 2007

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 42: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Vision A Computational Investigation into the Human Representation and Processing of Visual Information

Foreword by Afterword by Tomaso Poggio

David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists

In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level

Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui

Visionwhatiswhere

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 43: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

A complex system must be understood at several different levels

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 44: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Werner Reichardtrsquos scientific legacy Integrative Neuroscience

bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels

mdash computation mdash algorithms mdash biophysics and circuits

bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip

bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip

bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 45: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

MIT (1981-)

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 46: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

43rd Stated Meeting of the NRP Associates March 14-17 1982

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 47: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Learning theory + algorithms

Computational Neuroscience

models+experiments

ENGINEERING APPLICATIONS

bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor

How visual cortex works ndash and how it may suggest better computer vision

systems

2

1

1min ( ( ))i i Kf H i

V y f x fmicroisin

=

⎡ ⎤+⎢ ⎥

⎣ ⎦sum

Predictive regularization algorithms

Theorems on foundations of learning

MIT (1981-)

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 48: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001

ON THE MATHEMATICAL FOUNDATIONS OF LEARNING

FELIPE CUCKER AND STEVE SMALE

The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial

T Poggio and CR Shelton

Introduction

(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear

We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of

languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-

ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])

(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice

Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)

Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In

Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University

grant No 8780043

c2001 American Mathematical Society

1

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 49: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2

1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA

Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering

One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label

In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data

In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses

What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-

ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the

algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate

Box 1Formal definitions in supervised learning

Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example

n1lim jXn 2Xj 0 in probability) if and only if for every e 0

n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z

S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as

I$f

zVf zdmz

which is also the expected error of a new sample z drawn from thedistribution In the case of square loss

I$f

XYfx2 y2dmxy

We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S

IS$f 1

n

X

n

i1

Vf zi

Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m

n1lim jI$fS2 IS$fSj 0 in probability

An algorithm is (universally) consistent if uniformly for any distributionm and any e 0

n1lim P I$fSf2Hinf I$famp 1

0

letters to nature

NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 50: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Why do hierarchical architectures work

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 51: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern

Sung amp Poggio 1995

~15 year old CBCL computer vision research face detection

since 2006 on the market (digital cameras)

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 52: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Third Annual NSF Site Visit June 8 ndash 9 2016

Moore-like law for ML (1995-2018)

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 53: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses

Visionwhatiswhere

bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)

ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex

Van Essen amp Anderson 1990

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 54: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

The ventral stream hierarchy V1 V2 V4 IT

A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to

position and scale changes

Kobatake amp Tanaka 1994

Visionventralstream

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 55: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

74

Cognition in people

Shape representation in the inferior temporalcortex of monkeys

Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA

Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during

view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view

Current Biology 1995 5552-563

Background

Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object

Most theories which postulate that transformations of animage representation precede matching assume either a

complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set

Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer

Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu

Current Biology 1995 Vol 5 No 5552

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 56: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

9520 spring 2003

Modelrsquos early predictions neurons become view-tuned during recognition

Poggio Edelman Riesenhuber (1990 2000)

Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 57: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)

Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 58: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Database collected by Oliva amp Torralba

Psychophysics of rapid categorization

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 59: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Rapid categorization task (with mask to test feedforward model)

Animal present or not

30 ms ISI

20 ms

Image

Interval Image-Mask

Mask 1f noise

80 ms

Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 60: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)

Hierarchical feedforward models of the ventral stream

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 61: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Decoding the neural code Matrix-like read-out from the brain

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 62: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale

Hung Kreiman Poggio DiCarlo 2005

Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005

helliphelliphellip in 2013helliphellip

Page 63: CBMM: the Science and Engineering of Intelligence...CBMM: the Science and Engineering of Intelligence. The Center for Brains, Minds and Machines (CBMM) is a multi-institutional NSF

helliphelliphellip in 2013helliphellip