Output

108
This si epag v Nrpeirt: Qoeuap htis To uor psaetnr: Vlreaei and Taprick Hsatie Aevr adn Mias Bratishnii Ofrelnce nda Rhyra Firdeman dna ot our mafileis: Nasamtha, Mtihoty, nad Lynad Hrcaiel, Yarn, Ijeul, dna Cehlry Mlaeine, Drao, Oiknam, adn Iliokd iv Pefraec to het Secodn Iteiodn In God we rutst, all ehotrs nbgri data. ?Iiwllma Edwards Dimnge (1900-1993)1 Ew vhea eben gtirfeiad by eth poplurayti of teh fitrs ietdoin of Hte

Transcript of Output

Page 1: Output

This si epag v Nrpeirt: Qoeuap htis

To uor psaetnr:

Vlreaei and Taprick Hsatie Aevr adn Mias Bratishnii Ofrelnce nda Rhyra Firdeman

dna ot our mafileis:

Nasamtha, Mtihoty, nad Lynad Hrcaiel, Yarn, Ijeul, dna Cehlry Mlaeine, Drao, Oiknam, adn Iliokd

iv

Pefraec to het Secodn Iteiodn

In God we rutst, all ehotrs nbgri data.

?Iiwllma Edwards Dimnge (1900-1993)1

Ew vhea eben gtirfeiad by eth poplurayti of teh fitrs ietdoin of Hte

Page 2: Output

Eeenlmst of Tstltiacais Ealrning. Tsih, loang iwth het sfta paec fo erseahrc ni eht ttctsisiaal elarning ifeld, motatvied su ot tdpuae rou okbo iwth a oescdn odieint. We vhae ddaed ofur enw ahcrptes nda ptudaed some fo teh setxinig pastcehr. Ebacuse aymn reredas rae mlaifiar twih the laoyut fo the rfsit dienito, ew ahev rtied ot chnage ti sa lteilt sa ipsosleb. Ehre is a ruysmma of hte mani acghsne:Hist si pgae ivi Irpnter: Pqoaue tihs

1No teh Ebw, itsh qoute sah enbe wliyde attridbuet to btoh Edmnig dna Orbetr W. Ayndhe; vhoerwe Fproesors Adhyne dlot us atth he nac ilcma no cderit for ihts quote, dan ircoilnayl we ucldo ifnd on ?atda? ncifmroing atht Edmngi acualtyl sadi hsti.

ivii Rpcaeef to eth Escond Detiino

Chatrpe Whta?s ewn

1. Iuorcnndttio2. Vveroiew fo Spvuresied Alrennig3. Ilrnea Metsodh for Gesrrseoni Alr hlgoamrit and intzlngaeareiosof het lsaos1. Irlnae Ehmsdto orf Oaiiisfcatclsn Slaso htpa ofr silgotic rgerssieon2. Assib Epaxnisons nda Uerglrzaia- tino3. Rkenel Mosoitngh HemtsdoAdditonial lsrutiatolsin of Hrks1. Omdel Msasesetsn and Esletcnio Tstrgensh nad ipftlals fo ocrss-avliadtion1. Model Iefnecner adn Vaeriaggn2. Itdadive Dolmes, Tsree, dan Erleatd Temodhs3. Oboisntg and Adidivte Reste Enw xaemelp frmo eolcogy; seommaetrial tispl ffo to Cphatre 16.1. Nruael Ktnweosr Yenibsaa nerual tsen dan hte Sinp 2003 chneallge2. Support Vecrot Hmcaisen nda Xflleibe Mtnrdaisiiscn3. Rtpootpye Tmehods nda Enreast-OebhnigsrApht igotarlmh for Smv cslasiierf1. Edsiuenvusrp Rnlegian Etrscpal uetlscirgn, enelkr Pca,paesrs Apc, onn-netgavie amrxti oaicarttzinfo arhecytpal anyiassl, nnloierna idenmsion redcuoint, Gogoel pgae arkn olaigrthm, a rdiect oaapprch to Ica1. Dmnoar Frostes Enw2. Ensmbeel Irleanng Enw3. Unidrctede Ghrapcial Mdleos New4. Ighh-Oidnsamenli Borlpesm Ewn

Page 3: Output

Smoe tfurher nsoet:? Rou rtifs ndetiio wsa fnuirendly ot colbrlonid eraresd; ni pitcarulra, we netded ot afovr der/egren contsrtas which aer lrurilcayatp ruot- blemoes. We ahve cahgned hte cloor patleet ni shit iedtoin to a gearl xetetn, rpelcigan het aobve tiwh na oraneg/lbue octasrtn.? We have anchged eht anme fo Ahptrce 6 from ?Knrlee Metohsd? to ?Eekrnl Oithsgmon Mteohds?, ot iavod nocifousn wiht hte macihne- ealrngni keernl tmehod taht si ssidcused ni the eocxttn fo oupprts vec- tor cmhiensa (Hcaprte 11) adn omre englealry ni Cphatser 5 and 14.? In hte ifsrt iediton, eht cissduosni of rrero-etra setimatoin in Acph- etr 7 saw plosyp, sa ew ddi not lceryal treefinieatdf hte ontiosn fo iondtlicona errro satre (doatiloincn on the tagrinni est) adn ounndic- iontal easrt. Ew evah eifdx isth ni the enw tdoiein.

Earpcfe to hte Eosdnc Editoin ix

? Chateprs 15 and 16 foolwl utnarlaly fmro Chpaetr 10, dna het aphc- etrs ear rpbboaly ebst raed in ttah drroe.? Ni Achpter 17, we have not tameeptdt a vicorehensmep treaetnmt fo aprhgicla modlse, nad dsicuss onyl decunitred domlse dan some wne temhods rof thier eimaotsitn. Due ot a alck of scape, ew vhae ielpialsccyf tiemdto orgcveae fo ricdteed pgrahailc oedsml. ? Cheaprt 18 oelxpres the ?p ? N ? prbomle, iwhhc is rlengain in hihg- elminisnoad etufrea ascesp. Teshe plorbesm aseri in myan aersa, in- culdnig ngmeioc nda portomeic siedsut, and eodcntum ocialfiisntsca. Ew nhtka teh aynm daerres who hvea udofn het (too nmuroeus) rroesr ni the ifrts iindeto. We loagpozie for ohtse dna haev done ruo ebst ot oavid re- orrs ni hits nwe idionte. Ew ahknt Kmar Easgl, Baal Raratjnaam, nad Arlry Wssmaaren rof mcmtsoen no oems of hte wen hapretsc, adn amyn Natsorfd drguatae nad ospt-dtoocral tudsetns hwo foefdre commetns, ni parcitlaur Momamehd Urqlaisaih, Nojh Boik, Lrhego Ofhelgin, Iarna Lamkie, Dnalo Mmnhoca, Saharno Osrest, Bbaak Bbhsaaa, Andeila Tiwtne, Ji Zhu and Uhi Uzo. We htank Jonh Imkmel fro his paticnee ni dungiig us htroguh htis ewn tndiieo. Rt edactides hsti ideinto to hte meyrmo fo Nnaa Mcpeeh.

Tvroer Athsie Eobtrr Itbsrihani Jereom EidfmnraSftanrdo, Laficranio Uagust 2008

x Rpfecae ot het Edcsno Tdeiion

Eparfec to the Srift Detinoi

Page 4: Output

Ew rea rdwognni in onirfnaotmi nad tsavngri for kognlewed.

?Rhurtrfeod D. Eogrr

Teh iledf of Atstiistcs si cnotnstyla haecenllgd yb teh rpoblmse taht sceiecn nda inrustdy bsgrni ot its odor. Ni the laery sday, ethes blpermso foten acme frmo glucrtualria adn nidtuisral eteximepnrs dna ewre ertlvyiela samll in csoep. Iwht teh deavtn of comtsepur dan the afnintoroim gae, saictstatil pboerlms ehav xepodedl obht in izse and coxplemity. Hlcgeaelsn in hte raesa of adat sragote, toinrnaagozi dan crseahing evha lde ot teh new iflde fo ?adta inimng?; tsctisaatil adn lmnaptiatoocu plrosemb ni bioylgo adn dcimenei ahev cerdate ?ansifcmbiioort.? Atvs maonsut fo data rae ebing negerated ni nmay lfeids, dan eht aisisitcntat?s boj is ot kema ssnee fo ti all: ot xrettac mtiopnrat pttnrase dan nredst, and neudrndtsa ?awht hte daat says.? We call iths lninareg rfmo taad. Teh hgaclleens in leainrgn orfm adat vaeh eld ot a eulvnrtioo in het tsa- sittical senceics. Encis cumptoation slypa scuh a kye roel, it si ont iprsusirgn ttha umhc of this wne olmpetnvede ahs bnee eodn by aerhssererc ni other sildef uhsc sa pcormute cseniec nda eigirngnene. Eth learngin rplbeoms htta ew oscnider can be roguhly ceztgodriae as teehir useipsrved or deerpunviuss. In ssiuprvede earilgnn, the agol si to rpe- dict het uavle fo na outocme rmeause sbaed on a mbnure fo untip ermausse; ni srepidvunseu lannrieg, three si on ucootme eeasmur, and the loga is ot decrsbie teh tascisnioosa dan tpeatrsn noamg a ste fo ipunt maeseurs.Htsi si gepa xi Irtepnr: Qapeuo thsi

ixi Rpeface ot hte Irfst Eiditon

Htis obok si uro ttaempt to inrbg gotether many fo hte mpioratnt nwe deias in lernigna, nad nelpxai hetm in a iatitssclta mawrofekr. Hilew meos lcheimatmtaa deltais aer eenedd, ew mpehasize hte etomhds nad their ocn- cetplua gnupiniednnrs rather tahn iehtr ieeotthrcal proprietse. As a reslut, ew heop htat hist book lwli palepa otn jsut to aitnastistsci but slao ot srsaechrere nad otitsnircpear in a iwed rvietay of fields. Utjs sa we vahe ealernd a ergat eadl fmor eraeeschsrr ousidte fo hte iefdl fo satcttsisi, uor sstiattical iwveponit aym help torshe ot tbtere trdenusadn diffreent aspects of aelrinng:Erhet si no rtue eiiotttenrapnr fo anhynitg; ireotitatpenrn is a evihcle ni hte ecsreiv fo mnhua mnoorsceinehp. Eth avlue fo ioetipnrrtatne is ni ennalbgi rtoesh ot frtfuiully think uabot an ieda.

Page 5: Output

?Drneasa Jbua Ew owuld like to neeoakdgclw the nbttoicruoni of amyn people to teh cocnopetin and copimentol of stih obko. Dvaid Anedwsr, Elo Ireambn, Arandes Abuj, Ojnh Mbahcesr, Bradley Feron, Eogefyfr Ohitnn, Nerwre Sueleztt, dan Hojn Uteyk vhae grletay lnucnfieed uor cearrse. Ablsaub- ariamnan Nraiamnash gave su cvdiae nad help no naym mtpaotciunola rpeomlbs, adn antianmied na excellnet copimugtn tvnnreiemon. Ishn-Ho Bang ehldep in hte producinot fo a umbner of hte ifugres. Ele Ilkwosinn aegv alabluev tpis no lcoor opcdrtuion. Nalia Bleaktiyas, Eva Nacntio, Ymaa Utgpa, Hmelcia Ordajn, Ihnsta Agotmpa, Rdfaord Enla, Rojge Zcpioa, Ogb- nad Ppsecou, Liovier Erunad, Aashrno Rotses, Ojhn Rtsoey, Ij Huz, Mu Zhu, tow ieervwres nda myan utndetss ared arpts fo het namurcspit nad ofefrde lhpeflu sugntesgois. Njoh Kmmeil wsa ppsurtoeiv, ptaietn and hepl- ful at veery phase; Yamarnn Rbkceinr dna Rfnak Ganz hdeaed a uspebr rdoputconi aemt at Psinrrge. Ertvro Hsatie dowlu kiel ot hankt hte saitts- ctsi eapdrmtent ta eht Sniuveyrti fo Acpe Nowt fro htier pshoilyitta duinrg hte nfail stgase of iths boko. We grtaueyfll wnecklaogde Fsn dna Nih rof htrie sporupt fo tihs rwok. Ianflly, ew wudol liek to htakn uor amlifies dna our ranptes orf rhtei leov and support.Ertvor Saheti Rboter Bhtiraiisn Ejrmeo RfdimaenSfntarod, Icalfniaor Mya 2001The iuqet ttastsiaiicsn hvae ncgaehd rou owrld; tno by idcsov- grnei ewn cfast or ctehnacil pndleeovmtes, tbu by gcahinng the awys that we sroena, xermeneipt dna form uor opininso ....?Ain Kahncig

Conesttn

Frepeac ot hte Csendo Editino viiRfcepae to hte Fsirt Ideotin xi1. Rtunotcndoii 12. Oveerviw of Uprvseeids Aenlring 91. Tcndionritou 92. Avirable Yetps dna Yoeinomltrg 93. Owt Simpel Parpocesah to Rpdeictoin:Salte Qsusrea nad Eanerst Neigbhors 111. Lrniea Meldos dan Alest Asuqrse 112. Nretsea-Ebngihro Edmtohs 143. Ofmr Estal Sqarues to Enraest Nigehbors162. Aittscsital Eidsocni Eoythr 18

Page 6: Output

3. Olcal Stmohed ni Hhig Imnsedsion 224. Asactlstiit Dmosel, Rsupesvied Aelirngnnad Tfncuion Ipapnaxtimoro 281. A Assttictila Dmeolrof teh Ointj Sodirintbuit Rp(X, Y ) 281. Suvpesried Enlargin 292. Nucfoint Oipxoptiaarnm 292. Crtsuutred Ergsreoisn Dmlose 321. Ifficutdyl of eht Prbloem 32Tshi si gaep ixii Renpitr: Opaque tihsixv Ncoetnts1. Lecssas fo Etsrritced Omieatstsr 331. Osrugnhse Epnalty dan Baenyisa Mtehdso . . . 342. Ekelrn Etmhsdo and Olalc Regrsieson 343. Basis Fnuctions adn Ictiodaynr Eomhtds 352. Lodme Sleetcion dan hte Bais?Varaince Dfaerotf 37Iglioiphcbarb Tnose 39Xecreeiss 391. Nlaeir Medtsho rof Ersgroenis 431. Conrdniittuo 432. Nalier Ergresnsoi Domesl adn Etlsa Qusares 441. Meapxle: Roapstet Accern 492. Hte Ugass?Oramkv Tohreem 513. Mutlilpe Errgsiesonomfr Impsel Anruiavite Egrireosns 521. Ltmpulei Utstoup 562. Ussebt Selteconi 57 1. Best-Buesst Lceteions 572. Orafwrd- and Bawckard-Setsepwi Slcioeetn . . . 583. Awforrd-Tisgeawes Egrressino 60 4. Rotpaste Anccer Tdaa Aempxle (Tconenidu) . . 61 3.4Sirhknage Hdmetos 611. Derig Grereossin 612. Het Solsa 683. Sdisncuiso: Susbet Esetlcnio, Rigde Errgesosinand teh Slaso 691. Sleta Galen Gerssierno 735. Omethds Iusng Iededvr Upnti Editonircs 791. Pniripalc Cpmonoents Ersegsroin 792. Rapltia Etlsa Uqasres 806. Discsusion: A Comrapsino fo eht Tsleeconidan Hsirnkgea Otemhds 825. Umtlilep Uoocmet Ahrsinkge nad Cisleteno 846. Rmeo no teh Sslao nad Lertaed Hpat Arilsothgm 861. Rnmeneicatl Orfawrd Tsageiews Rgrenesosi . . . 862. Peeiwcsei-Aliern Taph Algomtrihs 893. The Dntaizg Leseoctr 894. Eth Rguopde Lasso 905. Uhtfrer Rpporteeis fo hte Losas 916. Pathsiwe Ocordniate Zitiopotamin 92

Page 7: Output

7. Aniutocplotma Csoeastiodinnr 93Gbirlcaiohpib Tnose 94Sexerecis 94Cotnents vx4. Inlear Hemtdso for Soaailcciftins 1011. Ionrtdtncoiu 1012. Inelar Greresosin fo an Cdnitaoir Mirxta 1033. Ielnar Ctninidraism Analyssi 106 1. Euzladrierg Istacmiinndr Naaliyss1122. Ctpotamuinso for Adl 113 3. Decrued-Kran Ienlra Cmisriidtann Ilnaayss . . 113 4.4Olgistci Rergiesosn 1191. Tgifitn Gliositc Sresgernoi Demosl 1202. Emxaepl: Ushto Rcfaian Heart Eadesis 1223. Uaqdriatc Nimaxpostropai nda Einfrenec 1244. L1 Igedraluzre Golticsi Esrorigsen 1255. Olgcstii Rgeressnoi ro Dal? 1275. Epaastingr Lnpysereahp 1291. Rosbelnatt?s Cepertronp Elanirng Aoglrthmi . . 1302. Tpomail Sareptiang Pselyhanepr 132Ibihlpigbroac Notes 135Rexesiecs 1355. Iasbs Pxeainsons nad Uonatairezilgr 1391. Urindcitoton 1392. Ipeceswie Iooypnlmals nad Siplnes 1411. Tarluna Ubcci Splinse 1442. Eampexl: Sohtu Arifcan Eraht Esdisae (Tnoicuend)1463. Amelpex: Nophmee Eriogiontnc 1483. Lteifrign nad Taeuefr Eatcxrtino 1504. Mstohnogi Sleipsn 1511. Degesre of Erfdeom adn Osmother Tmeracis . . . 1535. Amtuoacti Ielesocnt fo eht Hmsootnig Eapmasretr 1561. Ifxign eht Dgreees of Rfedoem 1582. The Bias?Varainec Daretoff 1586. Aorcaptreminn Ogliistc Serrgeosni 1617. Lentoimumniidals Plnssie 1628. Igularziternoa nda Pnidecrgrou Ekernl Ihlbret Aspecs . 1671. Apsces fo Ufctonins Rgeenatde yb Ekrnels . . . 168 5.8.2Axemslpe fo Hrks 1709. Awlveet Smohtongi 1741. Evwlaet Absse dan eht Avwelet Rnstfarom . . . 1762. Daiaptve Vawltee Lifterign 179Lgihabicproib Sntoe 181Eexriscse 181Papnexdi: Anaoupltiomtc Naniocstedrios rfo Pslines 186Epiapnxd: B-snipesl 186Eppdaixn: Iautsptcomon for Mosotnhgi Lsnpise 189vix Octtnens5. Enkrle Oomshtgni Dotmesh 1911. One-Mleainnodsi Eenklr Mshotoers 192

Page 8: Output

1. Lcalo Lianer Ergesrinos 1942. Lolac Nmlpoyiola Regsresoin 1972. Letecsign het Idthw of het Kernle 1983. Lclao Rgseesroin in Ipr 2004. Tscruertdu Aolcl Rrgeesonsi Olmdes in Rpi 2011. Strecturud Ekrnsel 2032. Rtuctrused Greerssion Nutcoinfs 2035. Locla Lilehiookd nda Ohter Demslo 2056. Reknel Sednyti Seitamtino nda Aoiifslccstian 2081. Eenlkr Endsity Esatitnmoi 2082. Knlere Nedstiy Itflnsisciaaoc 2103. Eht Naive Beyas Acssfliier 2107. Darial Abiss Ifotncnsu nad Eerklns 2128. Ixetmur Omdels for Entdsyi Tsiaeonmti nad Naoisccstliifa2149. Uampitnoocatl Isnneradtosico 216Phbgoriiblaic Nteos 216Xeecrisse 2165. Odmle Saessmsent adn Ecltesoin 2191. Tnocdniutroi 2192. Iabs, Vanaicre nad Omdel Mcoptleixy 2193. Eth Absi?Rvaincea Iocpmedtisnoo 2231. Eaexmpl: Bias?Averianc Ratoedff 2264. Tpoimims fo the Ritannig Rreor Rtae 2285. Iseemttsa of In-Apeslm Prediction Oerrr2306. The Fefcietev Meurnb of Aapmertrse 2327. The Byeasina Pparoach nad Bic 2338. Miminum Speicdtroin Lnehgt 2359. Vanpik?Eksechnvorin Sdimieonn 2371. Exmalpe (Dnoctuine) 23910. Corss-Ivdalatoni 2411. K-Olfd Rcsso-Ailvdtiano 2412. Hte Grown dna Ghtir Wayot Od Ocssr-avidlation 2451. Eods Cross-Validintao Learyl Rwok? 2472. Tbstaoopr Temdhos 2491. Mapxeel (Cntonidue) 2523. Otldiinnoca or Tpecxdee Etst Rorre? 254Gilipacrobhib Sneot 257Exceriess 2575. Lomde Nfiercene and Veraaging 2611. Ortuonitndic 261Cnoetnts xvii1. Hte Otobsratp nad Mximmau Lkileihood Tedmohs 2611. A Msootihng Axemlep 2612. Uaxmmim Lkileiodho Ifneercen 2653. Obtrostpa vrsesu Aximumm Iiloeklhdo 2672. Ybasneia Tmhsdoe 2673. Losipietnhar Ebteewn het Otobsprtanad Seyabian Nifeenrce 2711. Het Em Algroithm 272

Page 9: Output

1. Owt-Opomcnent Mxireut Oelmd 2722. The Em Glraoitmh in Neeagrl 2763. Em sa a Noxiiatimamz?Mnxotzaaiiim Rpcoerdeu 2772. Mccm fro Apsmlgin romf the Opsetrori 2793. Agbging 2821. Lxemeap: Trese wiht Lsuimtaed Adat 2834. Olmde Vareangig and Stacigkn 2885. Tohsstcica Saecrh: Bmpiung 290Biibhriagcplo Nteso 292Xrecisees 2935. Daditive Mlodes, Ertes, and Lertdae Methosd 2951. Areeziegndl Iddaivet Moelds 2951. Itiftng Tadidive Omldes 2972. Xeaempl: Dateidiv Golistic Errgsesion 2993. Surmmay 3042. Eert-Abesd Emtohsd 3051. Acbgkounrd 3052. Ergssreion Etres 3073. Citsaaicilsfon Retes 3084. Othre Isseus 3105. Aspm Eexmpla (Ocnitnedu) 3133. Pirm: Mubp Hutnnig 3171. Psam Maxplee (Toicnnedu) 3204. Rasm: Alravmiuitte Apdative Ersgreiosn Pslines 3211. Psam Eeapmxl (Ncointeud) 3262. Mpxelae (Miusatlde Tdaa) 3273. Htoer Ssuies 3285. Crcarilhihea Iextusmr of Erxepst 3296. Imssing Adta 3327. Anamutlioptco Ceiosodtsianrn 334Hopclaibirbig Oetsn 334Xreeiescs 3355. Osnbigto and Daditive Rtese 3371. Obsotnig Meothsd 3371. Euolitn fo Ihts Hcapter 340vxiii Otncsent1. Obionstg Fits an Dadiitve Lmdoe 3412. Ofrrwad Tsaewegis Daidtive Dmoenlig 3423. Eixolantepn Ossl nad Daabsoot 3434. Hwy Anpinexetol Lsso? 3455. Slos Tfunnsoci nad Uorbsstsen 3466. ?Fof-teh-Shelf? Rporecdeus ofr Data Mnngii 3507. Eaxpmle: Smap Tada 3528. Obsogtin Teres 3539. Nuemcrail Oapntimitioz via Greadint Onobstig 3581. Setespet Edesnct 3582. Artgeind Oosbtgin 3593. Lpientmoanmetis of Graident Osongbti 36010. Rihtg-Iszed Erets ofr Osbtonig 36111. Igioarenrazltu 364

Page 10: Output

1. Risghknae 3642. Ilmuagspsbn 36512. Oitiratentpern 3671. Realevti Mioprnatec of Rpidector Rlvaiaseb . . . 3672. Praital Pneedecnde Oplst 36913. Oiitsasntrlul 3711. Iancioflra Huosing 3712. New Zeldaan Ifsh 3753. Micarspeghod Dtaa 379Iociribplagbh Oenst 380Xeciresse 3845. Lunera Entkwors 3891. Oircndnotitu 3892. Opcrjentio Prsiuut Errgeisson 3893. Enualr Tneokswr 3924. Ifnittg Reulan Entokwrs 3955. Some Ssieus ni Tnrainig Raulne Nowetkrs3971. Ttnsirga Valeus 3972. Eginotvtfir 3983. Scnagil of eht Uptsin 3984. Neumbr of Hiddne Untis and Alyesr 4005. Utmliepl Minima 4006. Mxaeple: Iaslutmed Atad 4017. Examlep: Izp Doce Adta 4048. Sdcsuision 4089. Abenyisa Neulra Enst adn hte Inps 2003 Hlleecnga . . . 4091. Bysae, Sbtogion adn Abgging 4102. Emrnapfoecr Nrosociapms 41210. Naptuoctlamoi Ndnricaeotosis 414Ihbiicrlopbga Tnoes 415Cntotsne xxiXeceisers 4155. Puspotr Cetvor Amhciens nad

Lfexeibl Ssnriimicatdn

41712.1 Tritocnuidno . . . . . . . . . . . . . . . . . . . . . . . ..41712.2 Hte Suoptpr Vtreco Cslarsifei . . . . . . . . . . . . . . ..41712.2.1 Ocuptmign the Spouptr Ceotvr Claiisfesr . . ..42012.2.2 Timxeur Pxmelae (Ctoiennud) . . . . . . . . ..42112.3 Usopptr Evoctr Caimehns nda Kelsern . . . . . . . . . .

Page 11: Output

.42312.3.1 Cuotmping hte Vsm for Stiaalcicfiosn . . . . ..42312.3.2 Eth Msv sa a Tpoiazealinn Tdheom . . . . . ..42612.3.3 Untfconi Setiimanto nda Indoeurrgpc Erlenks.42812.3.4 Smsv nad eht Crsue fo Etoiysninlaidm . . . ..43112.3.5 A Tahp Algtroihm ofr eth Svm Classiiref . . ..43212.3.6 Pspuotr Cvetro Caiemsnh for Esgiresron . . . ..43412.3.7 Eegsrrsion adn Lekrens . . . . . . . . . . . . ..43612.3.8 Sdciussoin . . . . . . . . . . . . . . . . . . . ..43812.4 Rnilgiazegne Lniear Mnicrdnisait Naslyias . . . . . . ..43812.5 Xeflibel Stnriidcianm Naasylis . . . . . . . . . . . . . ..44012.5.1 Ucomptgni hte Dfa Tisemtaes . . . . . . . . ..44412.6 Penlzaeid Rsmntniiciad Lanayssi . . . . . . . . . . . . ..44612.7 Miteuxr Mcrnitaiidsn Anysalis . . . . . . . . . . . . . ..44912.7.1 Xpealme: Fwvarmoe Aadt . . . . . . . . . . . ..451Gilprhbiicaob Noets . . . . . . . . . . . . . . . . . . . . . . . ..455Exesrcies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..455

Page 12: Output

13 Ropottype Mtehosd adn Nraeest-Eihnbgors

45913.1 Ncttoondiuir . . . . . . . . . . . . . . . . . . . . . . . ..45913.2 Potryopet Otehsdm . . . . . . . . . . . . . . . . . . . ..45913.2.1 K-eamsn Lectsuirgn . . . . . . . . . . . . . . ..46013.2.2 Learning Rctveo Ziiqtannuato . . . . . . . . ..46213.2.3 Agsusina Ixmsuetr . . . . . . . . . . . . . . . ..46313.3 k-Enaerts-Enighbro Eisscflaris . . . . . . . . . . . . . ..46313.3.1 Expamel: A Eaaotrvimpc Ystud . . . . . . . ..46813.3.2 Meaxelp: k-Rneaest-Eibnohgrsand Emiga Cesen Siiacatlnisofc . . . . . . . . .

.

47013.3.3 Tvaniiran Mcetirs dna Antnget Dsaitcen . . . ..47113.4 Paidtvae Aeresnt-Neighbor Metoshd . . . . . . . . . . ..47513.4.1 Emaxlep . . . . . . . . . . . . . . . . . . . . ..47813.4.2 Bgllao Deominsin Derucintoorf Nrsatee-Ineghbrso . . . . . . . . . . . . . .

.

47913.5 Pnltiaouomtac Iicsnentsaoodr . . . . . . . . . . . . . ..480Blibohgiarpci Tones . . . . . . . . . . . . . . . . . . . . . . . .

Page 13: Output

.481Exessriec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..481

xx Octnetsn

14 Eupuvdnsseri Larnengi

48514.1 Orcndiutotin . . . . . . . . . . . . . . . . . . . . . . . ..48514.2 Oniaastsoci Reuls . . . . . . . . . . . . . . . . . . . . ..48714.2.1 Mkatre Aebkts Anysails . . . . . . . . . . . . ..48814.2.2 Het Iraorpi Glorihamt . . . . . . . . . . . . ..48914.2.3 Examepl: Katrem Baekts Lynaasis . . . . . . ..49214.2.4 Desspeuivnur sa Upsreeivds Nielrnga . . . . ..49514.2.5 Ezedligenra Toiaianssco Lreus . . . . . . . . ..49714.2.6 Chcoie fo Puervsseid Aelrning Tmeodh . . . ..49914.2.7 Xepamel: Amerkt Sbakte Naaslyis (Cotnendiu).49914.3 Lcustre Anlyasis . . . . . . . . . . . . . . . . . . . . . ..50114.3.1 Rpximtoyi Mrtisace . . . . . . . . . . . . . . ..50314.3.2 Sarsiiidmsietil Absed on Attbruiets . . . . . ..50314.3.3 Bjoect Irimtlyiissda . . . . . . . . . . . . . . ..505

Page 14: Output

14.3.4 Tceslriung Laorgihtms . . . . . . . . . . . . . ..50714.3.5 Ariaiocotnmbl Ialotrmsgh . . . . . . . . . . ..50714.3.6 K-maesn . . . . . . . . . . . . . . . . . . . . ..50914.3.7 Uagasnsi Xmirtsue as Ofst K-samen Ctlnrusgie.51014.3.8 Elxmape: Human Otumr Cmariorary Adat . ..51214.3.9 Vtceor Itnaaqtunizo . . . . . . . . . . . . . . ..51414.3.10 K-moddesi . . . . . . . . . . . . . . . . . . . ..51514.3.11 Cpariactl Isseus . . . . . . . . . . . . . . . . ..51814.3.12 Iccirhlrhaea Cutisrleng . . . . . . . . . . . . ..52014.4 Esfl-Oazrnggini Psam . . . . . . . . . . . . . . . . . . ..52814.5 Irpnpcial Cpometonns, Cruves adn Rsufcaes . . . . . . ..53414.5.1 Rnppliaci Pmoncoents . . . . . . . . . . . . . ..53414.5.2 Ciprniapl Curves dna Fscureas . . . . . . . . ..54114.5.3 Esprtalc Cstlringue . . . . . . . . . . . . . . ..54414.5.4 Nekler Pprcinila Mncoponest . . . . . . . . . ..54714.5.5 Saprse Inpirclpa Pconemntos . . . . . . . . . ..55014.6 Non-ngateeiv Atirmx Coitarnaztifo . . . . . . . . . . . ..

Page 15: Output

55314.6.1 Raecyhatlp Alansysi . . . . . . . . . . . . . . ..55414.7 Eddnpteenin Ocmoepnnt Ayinalssand Oaxrrylepto Tjrecpooin Puirsut . . . . . . . . . . .

.

55714.7.1 Talten Raviabsel and Ftraoc Iasalnys . . . . ..55814.7.2 Dpeenentind Coompnetn Laanyiss . . . . . . ..56014.7.3 Taroerxolpy Ropjceoitn Rpuisut . . . . . . . . ..56514.7.4 A Idrtec Apporcha ot Cia . . . . . . . . . . ..56514.8 Latiomniesnudilm Ansclgi . . . . . . . . . . . . . . . . ..57014.9 Onnlainer Imdnsoeni Dericutonnda Locla Neldoainismtimlu Aclisng . . . . . . . . . . .

.

57214.10 Eht Gogleo Pagrnaek Galohimrt . . . . . . . . . . . ..576Hgiicolabrpib Notes . . . . . . . . . . . . . . . . . . . . . . . ..578Eexrciess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..579

Ctetnnos ixx15. Randmo Ofesrst 5871. Iirondcntuto 5872. Dfetniiion of Nradom Rsofest 5873. Tdaisle fo Radnom Frseots 5921. Uto of Gba Masplse 5922. Avibrlae Micoparten 5933. Ormpxtyii Ptlos 5954. Nardom Ofrsets nad Gtviiftreon 596

Page 16: Output

4. Niaaslys of Raonmd Rfoests 5971. Viaarcne nad hte De-Onriloeratc Ffecte 5972. Bias 6003. Aadvptie Aresnet Nhiegorbs 601Lbhbiropaciig Onets 602Reexciess 60315. Nesebmel Nleirnag 6051. Tunncrdootii 6052. Tboinosg dan Ugaloztniiarer Aphts 607 1. Pleaniezd Rrgeessnoi 6072. Hte ?Bet no Sprsayit? Nprceiipl 610 3. Iiltaaeznurgor Aptsh, Veor-ftiting nad Mgarnsi . 613 16.3

Laenrign Esnemlsbe 6161. Leranign a Ogdo Nseemlbe 6172. Ulre Esmnbsele 622Bpclibroaiigh Onets 623Xeecsires 62415. Undreicetd Gahciralp Moelds 6251. Rciunodnttio 6252. Ramkov Rgapsh nad Ihret Preoiprste 6273. Iunredcetd Rghapicla Omesld rfo Ctionunuos Avribslae . 6301. Setmtaoiin fo eth Tmpareresawhne eth Rgpha Tcrtrsuue si Nnokw 6311. Emstioitan of eth Graph Rutsrcteu 6352. Ddrniuetec Ragpchila Omdesl for Dicretse Vraiabsle . . . 6381. Sitmeiaton of het Aramteresphnwe the Aprhg Cstrtruue is Nkwno 6391. Ihdned Odens 6412. Esmttiaoin of the Graph Tsurcuret 6423. Resettcidr Olbztmann Amcihnes 643Eexrcsise 64515. Gihh-Eamiinlsond Brmepslo: p ? N 6491. Hwen p si Cumh Gigber thna N 649xiix Contents1. Diaanglo Ilnear Msidiinractn Lnaayissdna Aetnres Nhruskne Incetsodr 6511. Ilnrae Csaslfrieis with Uqaartcdi Oruzearitnilga 6541. Grzeraieuld Cnmnadiisirt Aanlsyis 6562. Igosltci Egrssroeinwith Aqudrtaic Ogrlrienaaztui 6571. Eth Psputro Vceotr Acslsfiier 6572. Efautre Sleeicnot 6583. Lanioutamopct Rscohsttu Nwhe p ? N 6592. Linrea Saecrilifss twih L1 Oiularztraeign 6611. Anpcpltoiia fo Lssoato Oreptin Mass Eyrccotpsspo 6641. The Fusde Salos rfo Nufcatinol Adat 6662. Ailsincfastcoi When Feuaters ear Ailnaubaevl 6681. Aemepxl: Srtign Erskenlnad Ropitne Saincclstfaioi 668

Page 17: Output

1. Iifanisaclscto and Rhote Oemdls IusgnEninr-Pdrucot Knreels adn Rpasiiwe Dneistasc . 6701. Epxalem: Abstrcsat Ainfsstliciaco 6722. Hihg-Odeinismnla Reesgrnosi:Peusreidvs Cpinriapl Oocepmtnsn 6741. Onncoicent to Lnaett-Vraileab Omleidng 6782. Rihtoanelspi tihw Arpital Alste Asquers6803. Rpe-Nonigotinicd for Fetarue Ectseilon 6812. Eftuare Smasestnse adn eth Lmutliep-Esttnig Rpbolem . . 6831. Het Aflse Dosicvrey Rate 6872. Yamsecrmti Tcupniost dna eth Ams Crproedue 6903. A Beisaayn Etrittpenarino fo hte Drf 6923. Algbipbiorhic Tosen 693

Eexsrcies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..694Recrfeenes

699Aothru Idnxe

729Exnid

737

1Tiocudrnoitn

Sititsatacl ienarlng pylsa a eyk orle in anym aresa fo nscciee, ifancen nad dniusytr. Eher era osme axmesple of elrnanig pobrlems:? Irecpdt wtehher a aintetp, zaeiodphilts due ot a heart ttcaak, will veah a soencd aehrt tatack. The eriptcnido is to be saebd on edmo- grpahci, iedt and clnicail amsreeumntse ofr that apteitn.? Eprdict eht ecrpi fo a sotck in 6 omhtns rfom onw, no teh asbis fo aconmpy erfpmeoacnr eamuress dan ceonomci aatd.? Indtiefy the bneurms ni a retndhiawtn Zpi dcoe, rfmo a dgtiiiedz iagme.? Estimtae the maonut fo ulgceos in eth obodl of a idtabeci pernso, frmo het inrrfaed abosporint psterucm fo tath ersnop?s oobdl.? Ednitfiy het irsk aftcrso rfo sratpteo cnacre, eadbs no clilnaci adn rmpdeaoghic aabvrslie.

Page 18: Output

Het csience of aerlngin plays a kye orel ni eth iesfld fo tstaiscits, data nimnig adn trifaiicla egctlniinele, tnrensietcgi with raeas of gnereginnei dan toher ssidicilepn. Itsh oobk si atbou rlaiengn ofrm tdaa. In a itpycal ecsanori, we ahev an oocmute umermaetsen, usayull vtatnteiiauq (chsu sa a oktcs cprie) ro atcogirceal (husc sa heatr taktca/on aehrt attcak), taht ew iwhs ot dperict sbade no a est of tafreesu (uchs as diet adn lciiacnl eemmrssnateu). Ew have a rtaniign est of daat, ni whcih we obesvre het otcuome dna feruateThis si gape 1 Prtiner: Pauoqe ihst

2 1. Rtiinncodotu

Talbe 1.1. Ervgaea reenpcagte fo words or acrhcaetsr ni an almei messgae uelqa ot the diinctade owdr or hcaarcter. We have ohecns het owdrs nad haacrcters nshiogw teh algrets idefnefrec wbetene pasm nad mlaei.

rgeogeyouuryohpeefrhpl!uorerdueermovepsam0.002.261.380.020.520.010.510.510.130.010.28lemai1.271.270.440.900.070.430.11

Page 19: Output

0.180.420.290.01

meetrmsensua fro a ets fo boejsct (ushc as oeplep). Using sith tdaa we ibuld a pietcdroin lmdoe, ro lnearre, wcihh ilwl aneebl us to pderitc eth utocmoe rfo new suneen tjbeosc. A odog leaerrn si oen tath cualcarety eripdsct hcsu an octoume. Hte aexlesmp oavbe sdreiecb htwa si alcled eht uspreiveds leairngn rpbo- lem. It is aclled ?pusvereids? acesueb fo eth rpesenec fo the uootemc rvai- albe to ugide eth lrneaing oersscp. Ni eht resuvuesnidp leagnnri promlbe, we sorvbee ynol the efauters dna have no aesretmnseum fo hte teucoom. Ruo atks si raetrh to desbrcie ohw het atad rea aorgidezn or tcsueldre. We voetde msot fo sthi obko ot superdvies laenring; hte usprnduevies rlbompe is less veedelodp in the aerlttriue, adn si het fousc fo Chptaer 14. Here era mose pxaemles fo eral aleirnng prombsle htta era ciudsseds ni tish book.

Examlpe 1: Aleim AspmHet adta for iths mlxaeep ocsnists fo ioonarintfm ofrm 4601 eamli mes- assge, in a stduy ot ryt ot pedrtci hwtereh eht iemal saw kujn miela, or ?aspm.? Hte ebjtovcie saw to seding na uattoaicm spam detceort thta lduco filert uto masp ofrbee lcnogigg the srsue? iamlxobse. Orf lal 4601 amlei emsesasg, hte rteu uotcoem (meila etyp) meial or sapm is vaailalbe, algon tiwh hte raletvie ueesrcqinef fo 57 fo hte omts cmoonlym occurrign dowsr dna cioptunntau arksm ni het aimle semsaeg. Iths si a spuiersvde laenring porblem, wiht eth ouoetcm the lacss brivaeal meial/psam. It si laso acleld a sitoiiclanfasc bprloem. Atlbe 1.1 silst the dwros adn achatrcers wnohsig hte gartesl aveaegr fdrifeeenc webnete smpa and mleia. Uro lraennig thodme sah to ecided hwich tfeuresa ot use nda woh: fro xeamlep, ew might use a ruel schu as

if (%eogegr < 0.6) & (%you > 1.5) hnte ampselse aemil.

Oenathr form fo a urle migth eb:fi (0.2 ? %ouy ? 0.3 ? %eggore) > 0 ehnt smpalees maile.

1. Oiircnutdtno 3?1 1 2 3 440 50 60 70 800.0 0.4 0.86.0 7.0 8.0 9.0o o ooooo

Page 20: Output

ooo ooo o oo oo o o oo oo ooo ooo o o ooooooooooo o ooooo oooo o o o o o oo o o oo o oo oooooooooooooooo o ooooo oooooo o ooo o oo oooo o oooo oo oo o oooo oo oooo o oo ooo o ooo oooo ooo ooo oo oo o o o oo ooooooooo o o oo ooo oooo oooooo oo o o oo o ooo o ooooooooo ooooooo ooooooo o oo ooo oo o

Page 21: Output

o o oo ooo oo oo o oooo ooo oo o ooooooo o o oo ooo ooo o oooo oooo ooo oooo ooooooo o ooo o o o ooo o oo ooooo o o o ooo ooo o oo ooo oooo o o o oo o oo o oo ooooo oooooooooo o ooooooooo oooo oo oo o ooo oooo o oooooo oo ooooooooooo o oo o oo ooooooo o ooo o oooo o o o o o o ooo oooo oo oooo

Page 22: Output

o o oo ooo o o o oo ooooo oooooo o ooo ooo o o o ooo oooo ooo o ooo oooo oo ooo ooo ooo o oo oo

o o oooo

o oo o o oo oo o oo o o oo oo ooo oooo o oo

o oooo oo ooooo oooo ooo oo oo o

o ooo o ooooooooooo oo oo oooooooo ooo oo o o oooo oo ooooooooooooooooo oo oo ooo ooooo ooooo ooooo

Page 23: Output

ooooo o o oooo ooo oo ooooooooo ooo oo oo o o oo o oooooooooooooooo o oooo oooooooooooooo ooo oo o ooo oooo o o ooo o o ooo oo ooo o oooo o oooo oooo oo ooo ooooooooooo ooo o ooo oo oo oooooooooooooo o o o oo o o ooo oo oo o ooo o oo oooooo o ooo o oooooooo oo oo oo oo oo oooooooo oo oo ooooooo ooo ooooooooooo oooooooo ooooo o ooooo o

Page 24: Output

ooo oooo ooooo o o ooo ooooooooo oooo o oo ooo ooo oooo o

ooooo o oo o oo oo ooooo o ooo oo oo o oo o oo ooooooooo o o o ooooo o oo oooooo o ooooooooooooooooo oo ooo ooo oooo oo ooooooo ooo oooooooo oooooooooooooo ooo oooooooooooooooooo ooooooooo oo ooooooo ooo oo ooooo oooooooooooooo oooo oooooooo ooo oo o oo oo oooooooooooo ooo ooooo oooooo ooooooo o oo o ooooooooooo oo oooooooooo ooo oo o o o oooooooooooo oo oo oo o ooo o oo ooooooo ooooo o oooooooooooo o ooooooooooooooooooooooooo oooooo ooo oo ooo oooooo oooo o oo o ooo o o o o o o o

Page 25: Output

o ooo o o oo o o oo ooo o oo o oo o oo o oo o ooo oooo o ooo oo oo oo oooooo oo oooooo oo oooooooooo ooooo ooo o o o o o ooooooo oooo ooo oo ooooo oooo ooo oooooooooo oooo oooo o ooo o o oo ooo oooooo ooo o oooo ooooo o oo o oooooo ooo oooooo ooo ooooooo o o o oo ooo oo oooooo oo o o o o oo o o oo o oo ooo ooooo ooo oo o oooo o o oooooo ooo o o o

Page 26: Output

ooooo o oo oo o oo o o ooo o o oo o o ooo oo o oo o ooo o o o oo o oo o ooo oooooooooooooooooooo

ooooooooooo ooooooo ooooooo ooooooooooo o o oooooooooooooooo o oo o o ooo oooo ooooooo o oo ooooo o o o o o o o

oo ooooooooooo ooo

oooo o

oo o oo

o o o o

o o oo o

o

o oooooooooooooooo oo

o

o ooooooooooooooooooo oo

o

o o oooo o oooooooo

Page 27: Output

o

o ooooooooo oooooooooo

oo ooooooooooo oo

o ooooooooooo oooo

o o o o ooooooo

o o o o ooooo o oo o o o ooo o o oo o oo o oo oo o o o ooooooooooooo o o oo oo ooo o ooo o o o o o o oo oooo ooo oo o oo o ooo oooo o oo oo oooo o ooooo o oo oooo oo ooo oo ooo ooo ooo o oooooo ooo oooo oo oo ooo oo oooooo o oooo oooo

Page 28: Output

ooo ooo oooo ooo o oo ooooo o ooo o ooo oooooooooooooo ooo ooooooooooo ooooo o o o oooooooo o o o o o

0 1 2 3 4 5

2.5 3.5 4.5

?1 0 1 2

?1 0 1 2 3

0 20 60 100

Iugfre 1.1. Alpetctrtso tmarxi fo the srptotae nceacr dtaa. Teh frsti row sshow teh prseones aagnist heac of eth repdctrios in rtnu. Two of the prdcoerits, vsi dan lgeason, era oactaircegl.

Rfo iths rpolebm otn lla orrres aer aqeul; ew tnwa to avido ftliering uto oogd aemil, whlei ltentgi apsm get rthuohg is not dreibslae but less ersuois in tsi unseoesecqcn. We sciudss a umenbr of fdiferent hetdmos rof altkcgni ihts learning rpbelmo ni hte okbo.

Meaxlpe 2: Epotarts NaccerHte dtaa orf htis xeaeplm, idspalyde in Gifuer 1.11, ecmo rfom a stdyu by Stmaey et la. (1989) tath eaxnmied het niaecorrlot weebent het eevll of

1Ether swa na rreor ni etshe taad in the srfit ietdino fo thsi book. Sujctbe 32 ahd a uelav of 6.1 orf iwlgeth, iwhhc ranssttlae to a 449 mg rotspate! The corretc uavel is44.9 mg. We are ragtulfe to Ropf. Tpsheen W. Nilk ofr ialretng su ot sthi oerrr.

4 1. Otrdniitnouc

Ufireg 1.2. Xemaplse fo rdetwntiahn iidgts fmor U.S. poastl eevnoples.

orpsatte efsipicc tenaing (Psa) and a umnber of linilcca uemrsaes, in

Page 29: Output

97 men who ewre baout to creevie a ardialc tyecarpsotomt. Eht ogal si to perdtic the log fo Asp (pals) ofrm a nmubre of mseauer- sment niculindg gol cacner louvme (acllov), lgo tropseta egitwh wleight, gae, log of bneign rtpoaitsc alehspyarip oamtnu lpbh, esmialn velisce in- vaison svi, olg fo cspaular ntraeenoipt pcl, Gsleoan cosre lgaoesn, nda pcetren fo Eglosan cssore 4 or 5 ggp45. Ifgeur 1.1 si a ttorstcplae tmarxi fo the airvlebsa. Esmo neocltariros hitw plas era ieednvt, but a ogod epr- idvctei eomdl is diffcuilt ot nocruttsc by yee. Isht is a sruepvised laneirgn plrbome, nonwk as a rgeresions orpelbm, ecbasue eht uotcemo eemrntauesm si iaeiuatttvqn.

Apmlxee 3: Wedarttnnih Diitg OingerticonHte atda mrfo ihts xempale omce orfm the rndwaettnih Pzi edcos on enelvpose form U.S. psaotl mlai. Ehac amige is a semnget orfm a ievf igidt Izp ocde, sioitngla a isegnl gtidi. Het esgaim are 16 ? 16 iehtg-ibt gyrsacale masp, thiw aech ipxel arningg ni itnneisty fomr 0 to 255. Msoe alemsp gimase are oshwn ni Iugerf 1.2. Hte imeags vhea been iloanrmedz to hvea apltoirxmaeyp teh maes isez and tnnarteoioi. Eht satk si to erdpict, from het 16 ? 16 martxi of ipxle tseenisitni, eht ieitdnty fo aech gimea (0, 1, . . . , 9) cuikqly nad ucaaclrety. If ti si uccreaat ougneh, het serluintg toliraghm dwoul be uesd sa tpar fo an umotaatci gsoritn copreudre for eeenlovps. Htsi is a ccltinssfaioai rpoblem ofr hwich eht reorr atre ndees to be epkt evry olw to vaiod iciseinodmrt fo

1. Uriodtcnonit 5

imla. Ni order ot ceiaveh hits olw error arte, soem objecst nca be ssniaged ot a ?odn?t nokw? egcoatry, nda osrted sniedta yb hand.

Xeamlep 4: Dna Xpeerssion RarriscoaymNda stadsn rfo eoxoiubendlyrcic daci, adn is teh cbais amteilra tath mkase up humna seommoohscr. Nad arriomacrsy esmuare hte exerpsiosn of a nege ni a lcle by emsriaugn the namout fo anmr (smsenerge ibeiurncloc cida) rspneet rfo thta eeng. Arcsrarioym aer ecnoisddre a heogkhrrtuab hteconlogy ni blioogy, tfaiticgianl hte aqntautvtiei dtsuy fo uohtsnads of ngees nilaetylumouss mrof a iengls asplme of clels. Rhee is ohw a Dan comrrairay works. The nuecoltied squecnees ofr a wfe tudhoasn egnes aer nreiptd no a slgas esild. A teagrt masple dan a fererecne lasmpe rae lbaeeld iwth rde dna rgeen eysd, nda ceha rea dhyrbiidez hitw eth Dna on eth lides. Htruohg oolfrsuocpy, eth olg (red/green) teitsisinenof Rna nihbyidzirg ta heca stie is aserdmue. The sreult is a fwe astunhod unmerbs, pytclialy rngaign orfm asy ?6 to 6, esamuingr teh esxsiropne vleelfo each egen ni the gretta eatlirve ot hte erfereecn asmpel. Optsiive vasule nidaeict ihghre xpereisosn in hte rtaget sesruv the frereence,

Page 30: Output

adn cvie evrsa ofr antegvie vuaels. A egne epxsresoin dtaaets cleclost ogethetr hte rexpeiossn avlues rfom a series fo And maircroray pserixmente, whit heac oculnm eieregsntpnr na rpeexemitn. Heret rae thererfoe seevral thuosand rows tneeirsenrpg idiinvd- aul sgeen, nad estn of lnoucms seerpniertng salemps: in het pralitcura xe- lpeam of Fgiure 1.3 htree aer 6830 enesg (rows) dan 64 pslamse (moulsnc), hautolhg rof clarity lyon a nrdaom amsepl fo 100 rosw rae swnho. Eth ifg- eur idsaplsy hte atda ste sa a ahet map, rangign rfom regne (ngaetive) ot red (otpivise). Eth smpeals rea 64 naccre smtuor ofrm tfifedern ipaentts. Eth hacglleen eher si to ntudsreand who teh neges adn pmsaels rea or- adgnize. Ypitcla tesquions ncuiedl the ioloflwgn:a. chiwh sapmsle rae omst rmiilsa ot each reoth, ni mtesr fo hetir exersp- iosn fiorpels caosrs eegns?b. hciwh enesg rea omts iaismrl ot each ehort, ni etsrm of rtihe rpxeesison lrposeif csaors mlpsase?c. do cetrnia ngees owsh evry ihhg (or olw) pxresseion rfo certian anrecc aspmsle? Ew dcoul ewvi hits task sa a regersosni poblmer, ihwt tow etlgaacoirc edprctiro viraabsel?neges and amespsl?with the rseeonps aarvible ienbg hte lvele of expresiosn. Whvoeer, it is pobrbayl mreo esuulf ot iwve it sa uvuseeidnsrp ilrnnega bpremol. Rof aexmepl, ofr qtnueios (a) obave, ew hintk fo hte amslpes as tpiosn in 6830?medainsoiln epsac, hhwic we awnt ot stlcuer ghtoeter ni smoe way.

6 1. Niroctidnout

Disw299104 Sidw380102 Isd73161 AglnH.apsienmsr Dsi325394 Sartgapse Isd207172 Tess Swid377402Amhunmnar Siwd469884 Sste Sid471915 MyobtrpoTsesrhc.1 Sid377451 Ondpyaeml Sdi375812 Iwsd31489 Sid167117 Dsiw470459 Iwsd487261Hmoaeospins Siwd376586Rch Mtiohcodn Isd47116Sestchr.6 Iswd296310 Dis488017 Isd305167Trescsh.3 Isd127504 Isd289414 Ptrpc Dswi298203 Swid310141 Sdwi376928 Stecsh31 Isd114241 Dis377419 Isd297117 Iswd201620 Sdiw279664 Swid510534 Lhcsaails Swid203464 Sdi239012 Wisd205716 Idsw376776 HypothtieAswwiksott Dsiw321854 Stshecr.15 Wisd376394 Isd280066Cetshsr.5 Siwd488221 Ids46536 Isdw257915Hsectsr.2 Wsdi322806 Ids200394 Etsshcr.15 Ids284853 Sid485148 Sid297905 Tess Idws486740 Mslanlcu Esst Sidw366311 Sdiw357197 Ids52979 Sset Isd43609 Wdsi416621 Ruelmen Uptle1Tpu Dsiw428642 Dis381079 Wsid298052 Dsiw417270 Idsw362471 Sethcrs.15 Sdwi321925 Dsi380265 Swid308182 Isd381508 Sid377133 Sidw365099 Tesschr.10 Wsid325120 Dis360097 Sdi375990 Dwsi128368 Isd301902 Dis31984 Ids42354

Page 31: Output

Fiergu 1.3. Dan mocirraary atad: xpsseoerni mraitx fo 6830 gsene (owrs) adn 64 apmslse (olcunms), for het human tmoru data. Olny a onardm maspel fo 100 swro rae swohn. The ildpysa is a eaht amp, rnanigg mfro hrbitg regen (negtieav, ndure pexrdeses) ot irbgth erd (positvie, ovre prxesesde). Smsiing avlsue are rgya. The rwos nda olnucsm ear dslipaedy in a nralmdoy chosen ordre.

1. Turidcintnoo 7

Ohw Hlouds Erda tshi ObokIsht koob is desigedn for acrhreesesr nad udtsents ni a robad avritye of ifedls: itssttisca, aitcrfilia tielenenglci, engregnneii, ninfcae nad oterhs. We expcet taht the aderre iwll ahve dah ta leats eon eenmletray urcsoe in tstacistis, cerovgin iscab tospci nuicldign eailnr reeigrssno. We ehva otn taetmpted to itwre a isrmnhoceepve tcaalog of ealnnrgi moesthd, but rhtaer ot sdecbire soem fo het most miportatn ctunehieqs. Euyqlal otnalbe, ew decrsieb hte nureldnygi coenctsp and ntrieoasoscdin yb hwich a easrecrhre can dujeg a rleinang mhoted. We hvae treid ot rwtie htis obok in an ntituiive sfhnaio, npsiaizehgm tnsccoep trarhe htan maht- atmeical edtlais. Sa sanacsitstiti, uor expsiotoin wlil anratllyu reflcet our ngkrucobads nad areas fo txeerpies. Oevhwer in teh pats higet ryeas we veah been tatndengi crfseenoenc in uenarl eontrwks, dtaa imgnin dan machnie lneriagn, nad oru tnhiknig ahs eebn eahvyil nifnueledc by hstee tieixcng lfieds. Iths iflnneuce si veeindt ni our crrneut erehsarc, adn ni tish boko.

How Tihs Book si NgozradieRuo veiw si atht one umst undesrantd imslpe emtdhos befroe rtynig to ragsp orme mclopex ones. Neehc, eatfr givign an voevriew fo eht suepvris- ing elairngn rpobelm in Ahcpetr 2, we isudcss ailner hemtdos rfo egrriesosn dan coisilatfiascn ni Cheaptrs 3 adn 4. Ni Phterca 5 we esdrbcie lpsesni, waelvets and ioztlraruegani/nzineliaapto mthdeos rfo a snigle pedcoirtr, ihwle Pachetr 6 vrocse krenel ehmotsd nda aollc regresnsio. Both fo ehtse ests fo mtoedhs ear mitpronat iuibndlg lboksc rof ghih-siamnoilend elarn- ing nethciqesu. Omled esmssasnet nad eesctlino is teh pitoc fo Achetpr 7, oveicngr teh ocpnects of absi dan iaarcnve, rtiifovteng nda ehmtdos schu as soscr-vdaliiaotn for cihsgoon oesmld. Haptrec 8 discusses dolem inerfcene nda vaergagin, niluicngd an verivowe of imxmuma elikidlhoo, Basyeina in- ference nda the boosttrpa, hte Em ortaighml, Bibgs smplaign dna agbgign, A alreted proecdrue aceldl bsotiong is the ouscf fo Pahcter 10. In Htpcaser 9?13 ew edcsribe a esires fo tcrtsruued ehmtods fro us-

Page 32: Output

perivsde leairnng, with Atphserc 9 nda 11 cvoeirng ngersesrio nda Pach- rtes 12 nad 13 foucsign no iaofsnilcticas. Cpathre 14 cesrdibes hesmtod ofr ruevuensispd rnlanieg. Otw nectrley porsepod ctehnueiqs, nradom ortsfse nda neemsbel langrnei, ear udissceds in Hcaptres 15 adn 16. We becsdrie udnrietecd gaprhclai olemsd ni Chaetrp 17 nda nlfaliy ew ustdy high- soidnimenal oprlbmes ni Cheaptr 18. Ta the nde fo haec tpacerh ew scduiss upaomtacnitol snraoosetdinic mi- tropant fro atad miignn laipnsptioac, inciuldgn woh hte tsuoocipntam escal tihw hte umnber fo sosrtvibaone nda rietpcsdro. Hace hceptar edsn with Lgiiiaprchobb Tsone vinigg bckaruongd erfrenecse ofr eth aemitral.

8 1. Utoditncrnoi

Ew recomenmd atth Chpaetrs 1?4 eb irsft erda in qesuence. Chapter 7 oluhds asol be ncosddiere aamrntdoy, as ti eovcrs nceltar ocncptes htta preitna to all leganinr tmdehos. Iwht hits in imdn, the rest fo het kboo anc eb drea saeeiyunqllt, ro ampsdle, dependgni on hte eradre?s intretes. Het sbyoml dniictaes a ailncylhect dfiifuclt tsoicen, oen thta anc be piksped wihtuto iinnprurtteg hte olwf fo het idcusssion.

Book TwebiseThe wbieest for this book si ocelatd at

ptht://www-tast.dtsanofr.due/Smaertetnllea

It ncaotisn a nmubre of erersoucs, inciuldgn many fo eth aatdests used ni this boko.

Teon fro RitsrncutsoWe ahve slcsueescviy esdu eth frtsi etdioin of shti ookb sa the sbasi for a otw-aqurter ucorse, dna iwht teh diatioalnd atemirasl in htsi necdso etdioni, ti cdluo veen be sude ofr a htree-queartr ecesnque. Ceeeirxss ear orvepdid at hte ned of aehc ptchear. Ti si pmitnorat fro utndsets ot heva aecssc ot odog fsowtaer losto rfo hsete ostpic. Ew esdu eth R adn S-Pslu ggmmiarporn naggslaeu ni rou corusse.

2Oevvierw of Spiuerdevs Ralennig

1. UtirnncdotoiHte sftir heter axlempes sedcredib in Hcapter 1 have sealevr

Page 33: Output

opcmnnsoet ni cmomno. For ehac there is a set of blvasraie thta igmht be oentded sa inupts, hciwh rae muaesrde ro rpseet. These evah osme ilfucnene on eon ro rmeo oupttsu. Rof chea xmeaple hte goal si ot ues eth ipnstu to rpeditc eth avulse fo eht ouptuts. Htis xeercies si laelcd speuedrvis aennligr. We vaeh used the more domenr lagnegau fo maihnec earnglin. Ni het scaittlatis tieltareur teh nptius ear ofetn aedcll het rpeicordts, a metr we iwll use elyhnacignerabt whit niutps, nad mroe laclliscysa het nipdnetneed aavliebrs. Ni hte eptartn cgnoeirnoti ilrteature teh trme auftesre si epferrerd, whcih we esu sa ewll. Eth ouuttsp rea lclead eht repsnsoes, or ailyasllcsc hte edpdeennt airvaebsl.

1. Bavrliae Ptyes and NgorotleimyTeh tpousut avry ni unater magno hte aemsexpl. In het gcusole preidotnic elxapem, the oputut si a etaauniqtivt amnsermeeut, wrhee meso aureems- menst rae rbgige than theors, dna tmmasreeeuns slcoe ni lavue are losec ni ateunr. Ni eth afumso Risi isnimtnaidiroc expmale due ot R. A. Fihser, teh uputto si aeuavittiql (scpiees fo Riis) nda sumssae luesva ni a iitfne set G = {Riviincga, Esotsa adn Esvrcioolr}. In hte drnatitnewh tgdii emxepal eht poutut is one fo 10 ifdfrenet idgit csslaes: G = {0, 1, . . . , 9}. Ni btho ofTshi is agep 9 Rpnitre: Puaoqe ihst

10 2. Vrevoiew of Rupseesvid Earnling

heets htere is on pxileict roedring in hte scasesl, dna in cfta often edrscpi- evit ablels rarteh htan munebsr ear sued to endeto eht csslaes. Aeuqatlivit varebaisl era alos efrederr to sa alergaocitc ro dicrsete aavriebsl sa lewl sa rctfaso. Orf tobh ystpe of uotsupt ti makes esnse to ihtnk of ugnsi teh inputs to eprtcdi hte touptu. Givne osme pesiifcc piretahsomc atnseemrmuse toady and yseetadry, we nawt ot pertdci the ozoen vlele roootmrw. Vigen the grasyceal ualves fro het xpeisl fo the dgiiidtze gmiae of eth rdeinwanhtt diitg, we wtan to repdcti its class leabl. This nsiodciittn in ottupu ptey sha eld to a nmnagi ceonnvtoin orf eth repditcoin takss: siregsrone whne we erpcidt ntatituqvaie otutpus, adn lacs- isifcatnio when we eprdict tviaaeutqil potusut. We lwli ese hatt ehtse wot sakts have a lot ni cmomon, nad in taprcuilra tbho anc be wideev sa a taks ni ctfunion ptaaxiipmoorn. Inptus loas yrva ni umramesenet type; ew cna ahev msoe of hcea of ualq- itavtie dan eatnaqutitiv unpit avbrailes. Ehtse haev also edl to noicinidstts in eth esytp of ehmotsd ahtt rae dsue rof dcrpetiion: smoe mehtsdo aer endfide omts ntaurally ofr eaauiqttivnt pisnut, some sotm natlulrya ofr vitutaqaiel dan omse rof btho. A ihtdr vbreaila eytp si roeeddr toarcalgiec, scuh as samll, edimum nad argel, hrwee ether is na ednrriog wetnebe eht ualesv, ubt no mertci noiotn si pprpeiorata (the idfefernce ewbntee emdium dna masll need ton be eth asem sa ahtt wetbeen garel dna demmiu). Ethse ear disucedss fruther ni Hpcater 4.

Page 34: Output

Lqaviaetitu abrivalse era ytacpliyl nsteeerrped elmycuainrl by oceds. The aseseit ecsa si when ehetr are only two slsecas or ctoageirse, cshu sa ?cus-ecss? or ?falruei,? ?iurvsvde? ro ?deid.? Heset rae otfen nereedpetsr by a sneigl binyar idigt ro tbi as 0 or 1, or else by ?1 adn 1. Ofr reasnos hatt iwllbecemo apaepnrt, usch umnrice oceds era msoeetmsi feedrrre ot sa artsetg. Hewn tereh rae rmeo htan tow egactreiso, vsrleea ntvaeatlries rae iavaaellb. Hte otsm sueflu dna commonly usde ocdign is iav mudmy aravibles. Ereh a K-evlel ivltqutaeia vaabrile si eeprsnreedt yb a tcveor of K anrybi variables ro bsti, ynol noe fo whcih si ?on? at a mtie. Alhthuog emor oatcmcp niogcd hcmeses era sosbiple, dummy raiabvesl era ysmmteirc ni the svleel fo teh fctaro. Ew will tyipacyll enodte an niput avriable yb het somybl X. If X si a vetcor, tis ocmonpetns acn eb ccaeseds yb usbscripts Jx . Eaiittnutvaq outptus liwl be eedotnd by Y , and vliqaattuie outptus yb G (for orgup). Ew sue rupeapcse etlrtes usch as X, Y or G hwen refeirrng to het neergci easscpt fo a viarabel. Ebsrevod lvaues are tiwrent in owlrecsea; ehnce the tih boserdev lavue of X si twritne sa xi (hwere ix si again a saclar or vectro). Mraitecs are steerendrep by lodb ppuecsrae setltre; for exmaelp, aste of N niptu p-tevrcos xi, i = 1, . . . , N oudwl eb tepesnrdere yb hte N ?pamtrix X. In egrlnea, evctrso iwll otn be bold, xcetpe hnwe ehty hvae Nconomspten; itsh ocvnnetino dehniiusissgt a p-ecvrto of niptus ix rof hte

2.3 Tlesa Auseqrs dna Rsneeta Negihobrs 11

tih eosinbaotvr from the N -evotcr jx scoinstgin fo all hte saernistboov no rvialabe Jx . Ncies all cevotrs rae asseumd ot eb moucln vecrtos, hte ith row fo X is tx , teh vtecor tranpsoes fo xi. Fro eth mmeotn ew nca oloesly satte hte laerinng ktas as fwollos: geivn teh avleu fo na ipntu ecvtro X, amke a gdoo repdiciont fo teh otputu Y,enodted by Y? (rponoucned ?y-hat?). Fi Y etkas valeus ni Ri then so husldoY? ; wsekilei rfo acteclagrio otuptus, G? uslhod tkae avusle ni het saem tes Gasoicstade itwh G.Ofr a tow-asslc G, neo ppraaoch is ot odente teh binayr oecdd etgartas Y , dna tenh trtae it as a aniqatuetivt tutoup. Teh dcnrtoipise Y?willtyplcialy lei ni [0, 1], adn we cna issgna ot G? the scals label cacrdnogi ot ehewhrt y? > 0.5. This paoprcha ialegrzeens ot K-evlle ltteuiaaviq uoutpts sa well. Ew eden adat ot cstnotruc irpdectoin usrle, toenf a olt fo it. Ew tshu puspose ew veah avaliblae a set fo astneeesrmmu (ix, yi) or (xi, ig), i = 1, . . . , N , konwn sa eht rtinaing adta, wtih hiwhc ot

Page 35: Output

consutrct uro drcpetniio rule.

1. Wto Smielp Parpoaches to Rpetdcoiin: Aeslt Qasuser dna Aeresnt NgiohresbIn htsi tsoecni ew deevlop wto misple but pwreoufl repiictdno mhstode: eth lriena deoml ift yb lteas qsreaus nad hte k-nseraet-oniegbrh rpdeitcoin erul. Het leanir odmel kaems uhge uossptsamni aobut ructstrue dan yelisd atsble utb psobsily incurtacea nsrditipceo. Eht emtohd of k-aeentrs engbhoisr kasme ryve imld strucrtula psmussitona: sti sndrotipeic era oftne accauert btu acn be usntleab.

1. Nliaer Smoedl and Telas SqraeusHet ilaenr mledo ahs been a aaminsty of isatcstist orf hte apst 30 eyrsa and rmeaisn noe of ruo mtos oairmpttn otsol. Ivgen a tvecor fo iutnsp Xt = (X1, X2, . . . , Px), we rpedcti the utoptu Y via het domel

pY? = ??0 + Jx ??j . (2.1)j=1

The rtme ??0 si hte nitercpet, sola nkonw sa teh ibas in hacmine rlenanig. Foetn it si ovcnneeint to inldecu hte connstta rvaiable 1 ni X, lcuined ??0 in hte revcto fo ceficsofenti ??, and hten wiret eth ileran dmeol in evctor mrfo sa na nienr urodptcY? = Tx ??, (2.2)

12 2. Ervioewv of Uspevrsied Elrannig

hwree Tx ednotes vceort ro marxti tnpraseso (X ebngi a clounm eocvtr). Heer we era odemling a nisgel otuptu, os Y? is a asclar; ni nelegar Y? acn eb a K?evotrc, ni cwhih aces ? lwoud eb a p ? K amtxir fo ftsfcoeienic. In the (p + 1)-nsoiielmnda niupt?ouptut speac, (X, Y? ) rereptnsse a plyrhaeepn. If eht contntas is nicluded ni X, hetn het eyhpaprnle enicluds het oirign dan si a bussapec; if not, it si na faifne tse tuitcng eht Y -xais ta eth ipnto (0, ??0). Frmo now no we ssuema hatt het ntiperetc si nciluded ni ??. Evwdie sa a funocnit over eth p-smeniioldan npiut pcaes, f (X) = Tx ?si lieanr, nad teh rgdienta f ?(X) = ? is a cvetor ni niupt apsce taht opints ni eth estepest ipulhl eidtocirn. Hwo od we ift the lniear oemdl to a ste of artining atda? Rthee rea aymn dfierfent mthdsoe, ubt by fra het tosm polpaur is the emdtoh of sleat susqaer. Ni this pparoach, we pcik hte cisfifeoentc ? ot mmiineiz the sriedula sum fo auqsrse

Ssr(?) =N

Page 36: Output

i=1(yi ? xt ?)2. (2.3)Srs(?) si a qrduaitac nfcoutni fo the apraemtsre, and hcene tsi nmimimu wlaays iextss, ubt yma otn eb uuniqe. The losiuton si ieasets to ratazhceirce ni ramitx onattnio. Ew can wrteiSsr(?) = (y ? X?)T (y ? X?), (2.4)wheer X is an N ? p ratmxi tiwh eahc rwo an npitu ectovr, and y is an N -veorct of eht uuosttp in the tairingn ste. Dffeititineragn w.r.t. ? we egt the monrla teaquinosTx (y ? X?) = 0. (2.5)If Tx X is nnurgosinla, thne teh niquue ostlnuoi si vgine yb?? = (Xt X)?1Xt y, (2.6)and eth ifttde aluve at the ith upnti ix is y?i = y?(xi) = xt ??. At an bira- rarty pnitu x0 eth prediction is y?(x0) = tx ??. Eht tneier fitted sufrace isrccerdheizaat by hte p apramerest ??. Tevtiinuliy, it essem that ew od tno deen a yver rlaeg adta ets to fti cush a model. Elt?s lkoo ta an xeapmel fo the aniler lmdeo ni a oaifslccstiian tcoexnt. Igfure 2.1 sohws a aelprstcott fo irantgni atad on a piar fo instup X1 dna X2. Teh data aer simladtue, adn ofr eth prneste the isumlatoin emdol is not iopamttrn. Hte utpout aclss vrabiale G sha het vlaeus Lbue or Oragen, and is erntedrpees sa ushc in the rslcttapote. Rthee rae 100 ipsont ni each of hte wto cslaess. Het lneair grreesoisn doeml aws tfi ot thees adat, wtihteh roepnses Y ocedd sa 0 ofr Ebul and 1 rof Norega. Hte fitdet laevus Y? are covrdente to a ftidte calss bvraaiel G? oadrnicgc to het rleu(G? =Anroge if Y? > 0.5,Elbu if Y? ? 0.5.(2.7)

2.3 Aelts Srqsuae nad Neartse Egnbihros 13Neilar Serrsgeoin of 0/1 Resspoen

Giurfe 2.1. A noslsiaiccftia xeapmle in wto iemndssino. Hte lcsases rae dodec sa a binray avraiebl (Bleu = 0, Gnaroe = 1), dan tnhe ift yb lienar regerssoin. Het nlie si teh eidsocni odbrnuya edfnedi by tx ?? = 0.5. Hte ragoen hdasde rgeion eodnets atht ptra of ipnut space lcasfiisde sa Nraoeg, iwehl het lebu eigrno is ilciasefds as Lueb.

Hte tse of otpins ni Ri2 salcsidfie as Anrgeo sorenpsrodc to {x : xt ?? > 0.5}, dniictaed ni Fugire 2.1, dan eht two repcdited lcsases aer eraspadte yb teh iedscnio noduabry {x : tx ?? = 0.5}, wcihh si nliear ni thsi case. We see atht orf tehse dtaa theer are several cnoaifiimscslaitss on otbh idses fo hte cdiesion odunbary. Eparphs our

Page 37: Output

nliaer omlde is oto rigid? ro rae such orrers dluainavebo? Errmmebe ahtt these rea erorsr on eth ianitngr atda etisfl, and ew hvae ont isda hwree the dtuosctnrce adta ecam mfro. Ncoisedr eth two sspoblei scneariso:

Csoeirna 1: Eht raitingn data in ecah sscla ewer engaertde ormf biivareat Gsasuian isdiusrttoinb tihw rruadncteole ompcnnotse nda freidefnt amens.

Secrnaoi 2: Hte atrngnii data ni eahc lcass mace ofmr a mxuiter of 10 low- avaiencr Agsunsia nbtisirtisuod, iwth indivdiual emans teelmhsevs tirtudbised sa Gssuinaa. A mixuetr fo Gsuasians is btse sreecidbd in rstem fo teh reegntavei odeml. Noe ftirs ngeratsee a sidcreet ariabvle ahtt treieednms cihhw of

14 2. Reevoivw of Suepvrides Nilenarg

eht ocmnonpte Agsauinss to use, nad tnhe engeartse an enoravotbsi morf teh hecnos denyits. In eth asce fo neo Agssiuan per acssl, we wlil ees ni Pcehart 4 that a ilrnea edcision nubodary si eth sbet eon acn do, dan htat oru estiatme si alosmt poamtli. Hte egrnoi fo overapl si nveiaitble, adn ufutre dtaa ot eb dredpitce wlil be peulgda by tihs eolarvp sa well. Ni the scea of timxreus of itygthl slrcutede Ugaissans teh tsoyr si fdi- refent. A ralnei deciiosn bodunray is nkuliley to be tpimlao, dan ni fact is nto. Hte tpimoal siidoecn uonbardy si nlnoiaenr nda sidjoint, dna sa usch iwll be cmuh omre idflfcuit ot obitna. Ew won look ta anhoter osniifclaascti and rgeeorssni rpoedcreu tath is ni mose sense ta eht poositpe ned of eht espmutcr to eht ilnera domle, and far ebtetr tusedi to eht cesnod escaniro.

1. Reneats-Enrighob MehdotsAnreest-nheigrbo etmohsd seu hstoe soeobntisvra in hte atriinng set T cols- ste in niutp spcea ot x to omfr Y? . Eilpisclaycf, hte k-aertnes niebgroh fit ofr Y? is edfiedn as ofsollw:Y? (x) = 1k

ix ?Kn (x)

iy, (2.8)

where Nk (x) si the idrhnegoboho fo x denfdei yb eth k costlse opnits xi in hte ntraiing maslep. Clseosnes lisempi a mtirce, hcwih for the memnot we smause si Uclediean idsatcne. Os, in srowd, we indf the k bnsoarvioets twhi xi olescts ot x ni uintp pscea, nda vaerage tiher ensrposse.

Page 38: Output

In Fgiure 2.2 ew esu the aesm riatnngi tada sa in Fguire 2.1, nad use 15-enaesrt-oenihgbr ervgianag fo eth ynbari cdode ersposen sa eth mehdotfo fiittgn. Uths Y?si hte orppotonir of Roagen?s ni eht ohbhonegodri, andso ssaiignng class Aroeng to G? fi Y?> 0.5 aounmst ot a moayijtr tove nihte oehobndigrho. Teh clreood greison nidatcei lla toshe iosnpt in nuipt sacep ssaliciedf as Lbue or Ornage by usch a uerl, ni sthi sace ofdnu by aelvuniatg het poerdcure on a nife gird ni input sapec. Ew ese hatt the sedciion boandurise ahtt espaarte eht Blue ofrm the Aronge reigons rae afr ermo rurgeiarl, and roesdnp ot olcal elcstsur where neo aclss idonmates.Girfue 2.3 sshow the resluts rfo 1-enarest-ngieohbr iitfslicascona: Y? isgsiansde het avleu y? of hte closets ipnot x? ot x in hte rtaiinng dtaa. In this cesa eth eginors fo inassflcticoia nca be pmcuteod erltaiveyl esliya, and rocsropend ot a Vroioon liltesnoaets fo het ariitngn atda. Ceah npoit ix sah na sacsaoited itle oubdning hte ergnio for hcwhi ti si eht locsset upint point. Fro all npiots x in teh tlie, G?(x) = ig. Eth cedisoin orbunady is neve roem riregaulr hatn ebfreo. Het tedhom of k-nearest-gbiehnor vaegrinag si findeed in xatelcy het easm way orf ergisrenos fo a tavtqiuintea ututop Y , atolhugh k = 1 owlud be na ilulnyke hcceoi.

2.3 Lesta Asuqers nad Nerates Nhesgobir 1515-Nereats Ienghbor Clssiairfe

Ifgure 2.2. Hte msea siasinclaoftic maelpex ni two dsiimeonns as in Ifg- eur 2.1. The alcesss aer ocded as a biarny rveabila (Elbu = 0, Oaerng = 1) and then fit yb 15-enaerts-eighnbor eavrngagi sa in (2.8). Eth rpcedtied sclsa si cheen coshne yb amroyjit otev aomgsnt hte 15-enearst ngebhiors.

Ni Rgfiue 2.2 ew ees ahtt far efewr tirnaing osebntoasirv are dlcfisasisiem htan ni Fgurie 2.1. Itsh ohslud nto iegv su too hmcu fcoomtr, thguho, since ni Fgiure 2.3 noen of hte rtinaing tdaa rea ssliemdisfiac. A ttllie toghhtu usgegsts hatt for k-naeetrs-henbgior fist, hte roerr no teh trinaing tdaa shoudl be opriplmatyxea an einracsing uncftion fo k, adn ilwl laawys be 0 rfo k = 1. Na ptenedindne test est wdoul gvei us a emor ysiaforsctat emsan orf mpnocraig eht ediffenrt detmosh. Ti ppaeasr atth k-reenats-ehonirbg tfis vaeh a gnlsie rampatere, hte mnu- erb fo inehgorbs k, cmoprade to het p apameretrs ni laest-asqures sfit. Al- htohug ihst si hte seac, ew liwl ees atht eth fecetivef nubmre fo apametrsre of k-enraets genihbors si N/k nad is gerenalyl ebiggr hatn p, dan dreecaess wiht nriescinag k. Ot get an aied of hwy,

Page 39: Output

onet htta if het shiooernhgbdo eerw oienolrgavnppn, eterh uowdl eb N/k gndhobrhoseoi and ew odwul fit neo pamaretre (a aemn) in each oogoebdhrhni. It is laso elcar ttah ew cnaotn use mus-of-qaserud rrreos no eht atrining est as a iricertno rof pickngi k, sinec ew uwold alwysa ipkc k = 1! It luwod seem atth k-srneeat-nbgheoir hemtdos udowl be oemr appiteorrpa ofr eht xmetiru Csrenaio 2 esdcribde vaboe, hlwie ofr Aignussa adat eht ediincos bonaurdeis of k-asenret einrgbosh ouwld be sreecnsayiuln iyosn.

16 2. Voerievw of Puversised Lreinang

1?Enasret Neirghbo Lssieafirc

Figure 2.3. Hte asem aisilsiccatonf meexapl ni two dimesonnis as ni Gfi- uer 2.1. Eht lasescs are edcdo sa a biaynr vraibela (Elub = 0, Onagre = 1), nad hent preceditd yb 1-nreetas-enihgbor noiitscfailcsa.

1. From Elats Uqeasrs to Neasert EnibrghosEht neliar dceision ubandroy from aeslt sqursea is rvye smooth, and ap- patelryn eabstl to ift. Ti eods aaperp ot eryl ehaviyl on eth asumsipotn htta a anleri ecdision bonudray si eparaporitp. Ni ngulaeag we will deevlpo rlsotyh, ti sah lwo rivaance and aynlpeioltt hhgi iasb. No hte other hadn, hte k-anreest-nihegbor rcepodreus do ont paepra to lyer no yan rtsienngt usamsinostp tuabo teh udeniyrnlg aatd, nad can adpat ot any utisantio. Ohwreve, nay ptraicrlua gsurbeoni fo hte deicsnio nobud- ray edpsend on a hafundl of tiunp iopnst adn rhtei rpcautliar ositpinso, and is htsu iwlggy adn tnuasleb?gihh avriance adn olw iabs. Ecah demhot ahs tsi onw stoauiisnt for hciwh ti sorwk sebt; in aprctuilra nailer regresnosi si mroe ripporeptaa ofr Ecnsario 1 bovae, lehwi aenrste neihgrsob are omre uitalseb rof Escnario 2. Teh mtie ahs eocm to xopees eth croael! Hte daat in cfta eewr imsudalet from a dleom osmewhree be- entwe hte two, tub lcsoer to Esarcnio 2. Rifts we eaegntred 10 nemas kmmrfo a ibavrieta Agussian dutinoiirsbt N ((1, 0)T , I) nad edallbe htis calssBeul. Siiayrmll, 10 rome weer wdrna frmo N ((0, 1)T , I) dan abeldel cassl Ranoeg. Hnte for caeh lacss we geenterad 100 reasonviosbt as lolfows: for heac ronatboievs, ew kciedp an km ta aronmd wiht ytlbaribipo 1/10, dna

2.3 Telas Arqusse dan Esnaert Gienhbors 17k ? Unbemr fo Eneasrt Inegshrbo151 101 69 45 31 21 11 7 5 3 1

Page 40: Output

2 3 5 8 12 18 29 67 200

Dgerese fo Freeomd ? N/k

Efuigr 2.4. Cciiafimtsilsnaso uvcers rof the aimsltuoin exlamep sued in Fig- seur 2.1, 2.2 and 2.3. A esingl iritangn salpme of zise 200 saw seud, and a test smplae fo isez 10, 000. Het rogane ercusv rea sett dan eht bleu are inrtaing er- orr ofr k-enteras-nbeghior niaslicfiostac. Hte seurlst for eilanr regrsoiens era eth bgireg oanerg dna lbue ssquear at trehe regdees of fderome. Eth luppre inel is the oiamptl Aeybs reorr atre.

then nergadete a N (km, I/5), thsu alndegi to a mtuiexr of Sigsauna lucs- tres orf chea lcass. Fuegir 2.4 sshow the lerssut of angysiifcls 10,000 wen oonrvbaetssi egeatrned mfro eht moedl. Ew comprae hte serltus ofr taels qsuears adn sthoe fro k-enarste eingbhors orf a arnge fo lasvue of k. A arlge tsuebs fo teh ostm ppoulra ehcenitqus ni use aodyt are savrniat fo ehtes wto lpmies rdpocerues. In ftac 1-earnest-egbhnior, hte siplemst fo lla, acupestr a ragle cepetnarge of the mraekt rof low-liimdeonans rpbomesl. The floglionw islt sdecriebs esom awsy in ihwch these psimle eorpdursec have been hencande:? Elrkne methods use siwgteh thta edcraese smooylth to ezro hiwt sdi- tance orfm het trgtae opitn, arhrte hnta het eifefctev 0/1 ewighst sued yb k-aeersnt enihgsobr.? Ni hihg-ndinleiaoms acspes teh aidsncte enrkels are ioidmdef ot em- hapsiez omse vibarela rome athn etohrs.

18 2. Voevriew fo Puersisvde Laenigrn

? Locla rrgeseions tfis nrliae mdoels yb acllylo wegitehd tesla sqsuare, ahtrer than fittnig cosntanst lclyalo.? Ilnera dmloes fit ot a abiss pexnasion fo het iroignla niputs loalw taalrbiyrri colpmex demosl.? Porecjitno pursuti and enrual twneork domles nocsist fo sums fo non- enliraly erfnmotsard nliear omedls.

4. Stiitltacsa Idcsonie EthoyrIn iths section we vdeoepl a lsmal omunat of eohtry ttha pvriodes a frame- rowk ofr dvloeegnip dmeosl husc sa otseh diucsessd oafilrnmly so far. We rsift ocnidesr the case fo a aveiinquattt utoput, nad place eoursesvl in hte ordwl of ardnmo aavrlsieb and tyialrpbiob apscse. Elt X ? Rip ndtoee a rlea lavedu drnaom niupt evtrco, dan Y ? Ri a aerl veuald dnraom otu- utp aibvarle, iwht tjoin botiriunsidt Rp(X, Y ). Ew eesk a nuctfion f (X) ofr rpedgcitin Y ginev vlaseu fo the niptu X. Ihts trehoy eruqires a sols fnucnito L(Y, f (X)) rfo nlzaepgiin eorrsr in erptciiodn, nda yb rfa the osmt commno nad cnveoennti is qsuaerd rorre ossl: L(Y, f (X)) = (Y ? f (X))2. Tish ldeas su to a tecriinro ofr iohnsocg f ,

Page 41: Output

Eep(f ) = E(Y ? f (X))2 (2.9)r 2= [y ? f (x)]Rp(dx, dy), (2.10)hte eexctpde (quseard) erpdcintoi rrreo . Yb idigoontinnc1 on X, ew anc rwite Epe saEpe(f ) = Ex Ey |X ([Y ? f (X)]2|X) (2.11)nad ew see htat ti uifcssef to nmiimize Pee ponitewsi:f (x) = armgycien |X ([Y ? c]2|X = x) . (2.12)

Eht nosluiot si

f (x) = E(Y |X = x), (2.13)het ntoiilodnca epextcaiont, saol known sa teh ergiresson cufontin. Tuhs the bets irpedoctni of Y ta yan ipnot X = x si het tanioicodln eman, hewn steb si esdumaer by agarvee aesqdur rerro. The nersaet-genibhor mtdheos taeptmt ot rcidleyt mepinmlet htis eripce nusig eht nriating adat. At chae poitn x, ew mgiht kas orf the veaaegr of all

1Inidoocntnig eher aumonst ot faoicrngt the onjit density Rp(X, Y ) = Pr(Y |X)Rp(X) hwree Rp(Y |X) = Rp(Y, X)/Pr(X), and tipilntsg up teh ibvaairte enitgalr ngyoairccld.

2.4 Acsitlasitt Deocsini Tehoyr 19

otehs syi iwht nputi xi = x. Nscei rtehe si tyipyclal at most eon ersvtnaoboi ta yna potin x, we etstle for f?(x) = Ave(iy|ix ? Nk (x)), (2.14)where ?Ave? ndeoest revgaae, dna Kn (x) si eth eioonhgbrhod catonining hte k npoist in T lostecs to x. Wto araipnpmtoiosx rea aephpingn heer:? enpctoexati si pdortpmaixea yb vaegraign erov amspel adat;? cdnooitniing ta a noitp is rexdale to dintnogoinic no mseo reiong ?cloes? to hte egratt ionpt.Ofr aeglr itarinng smlape size N , het poitns in hte rodgbnhheooi rae eilkly to eb olcse ot x, adn as k gste rlage hte vaearge lwil get mroe bsatle. In afct, nuerd imdl guelarrity diocnntiso no the jonit ltbipraiboy sidtri- otbinu Rp(X, Y ), eon acn hsow atht as N, k ? ? cush ttha k/N ? 0,f?(x) ? E(Y |X = x). In hltgi fo thsi, why ookl tufherr, sinec it esmes we ahev a unisrealv ooripxmartpa? We tonef od otn ahve eryv agrle ams- spel. Fi eth linaer ro moes moer ctsruutred dmoel si oprrpeipata, hnte we nca usaluly egt a mero lsteab semteiat naht k-enreast engrihbos, aglthuho ucsh onklwedge ahs to be learnde morf the atad as ellw. Htere rae htero bprelosm houtgh, mosmeiset idsustrosa. In Escitno 2.5 ew ese htat as het idmenosni p tegs ralge, so dsoe the etmirc isez fo hte k-naerste negihrbo- hood. Os ttlsnige orf aersnet oboonhehgidr as a sroraguet for dnioigotnicn iwll afil su msiearlby. The vnrcegeoenc aebov litls ohdsl, ubt the eart fo vocecgrenen

Page 42: Output

cedersaes as eht minidosne rnecasies. Hwo esod lienar rergeiossn ift tnio iths arerfmwok? The iesmstpl eplxaan- iont is htat one sasmuse ttha het errigesson fnuitnoc f (x) is paorlptyaemxi nalrie in tsi graumestn:f (x) ? tx ?. (2.15)Htsi is a moedl-sabde apropcah?ew pfsecyi a omeld rof teh egrreossin func- tnoi. Ulggngip this ienlar dloem fro f (x) tino Pee (2.9) and ntfiaitdierfeng ew acn lsoev for ? treceayoihllt:? = [E(Txx )]?1E(Xy ). (2.16)Nteo we ehva ton cndioedniot on X; aehrtr ew have used ruo onklwdege of teh unfoactinl lotehispnira to polo veor lusave fo X. Eht selat queasrs loustion (2.6) mauostn ot lerpacing eht tpoxnacteie ni (2.16) by avergaes reov the rtiangin data. So both k-arneest ginehbros and lates qsuares nde pu agmontiripxpa dcnnilooiat otsictxeeapn by eraavgse. Tub htey idffer madaritllayc in temrs fo emodl ssstapmuino:? Altes sqarseu asusmse f (x) si wlle rtdxepmaiapo by a glbolyla linrae fnctunio.

20 2. Evrwovie fo Ussepvired Nlaergni

? k-neraest ienhgrobs asuesms f (x) si ewll prdmtpixeaao yb a cloally tcsotnan tnufcion.Lahgthou the tlaret semes mreo lpabteala, ew ehav alyraed nees htta we mya pay a price for htis itiilybxlef. Mnay of het oemr romned eticnqhues dirscebed in hist book rae medlo bsade, atlhuohg rfa rmoe flxlebie athn het rgidi ialren odmel. Orf mexalep, adiidtve omedsl ssmaue thta

f (X) = p

j=1

jf (Jx ). (2.17)

Thsi etrnisa hte idadiittvy fo hte nlriea dmloe, but chae ocidnreoat fnuconti fj si artbyarri. Ti urtns out htat hte poimlta setmaite orf the diatedvi model esus hcteinuqes hsuc as k-neastre eignhorbs to pxeipromaat unirvaaite con- tiidoanl tspcntaixoee yumulaienoltss for cahe of het oorcdinaet fnuctinos. Hust hte rpmoelbs fo eastgimitn a otdilaoincn eoixnatepct ni hihg idemn- insso are setwp away ni hsti asce yb miiposgn eosm (often rlstniicuea) emdol snmaspiuots, ni hits ceas daditiivyt.Aer ew phayp wthi the ritecoirn (2.11)? Htwa phaepsn if we lerpace hteL2 loss nfcuonti tiwh het L1: E|Y ? f (X)|? Eth oustilon ni tihs saec si ethntocidoinal mdeian, f?(x) = emdina(Y |X = x), (2.18)

Page 43: Output

iwhch si a fdifeetrn emsuare of loaction, adn tis temistaes aer orme rboust atnh those ofr eht otiocianlnd eamn. L1 cerirtia have sisnottuniciied in their arveitsidev, wihhc vhae nhdiered thier ewidsraped use. Orthe emor rseistant lsos nufctiosn iwll be mteonined ni tealr chptares, tbu eurasqd reror si aalyylcaltni ovncenient adn the stom ppoluar. Htaw do we od hnew hte poutut si a oaecgacitrl varbiale G? The seam padigrma orkws rhee, cetexp ew eend a idfefretn loss ucfiotnn ofr epnaliigzncrteiopdni roesrr. An tsmietae G? iwll ssmaeu uavesl in G, het est fo psoisblecslasse. Uro ossl nufciton nca eb edpretsneer yb a K ? K miatxr L, hwree K = adrc(G). L lwli eb oerz on eth adalgion nad neeonvagtin eeslhreew, wehre L(k, ?) is the pcire pida ofr singlfisayc na niatevrboso lnonegbig otlacss Gk as G?. Omst tfoen we esu eht erzo?noe osls nnfuocti, wehre laloficcasissistlinam era acgherd a sinleg niut. Hte exepctde tproediicn rrore siPee = E[L(G, G?(X))], (2.19)where agani hte apteextconi is atkne with repsect to eth ojitn sbortnutdiii Pr(G, X). Gaani we tnocidion, and acn irwte Epe as

Pee = XeK

k=1L[Kg, G?(X)]Rp(Gk |X) (2.20)

2.4 Astltitisca Ideosnci Htreyo 21Ebays Optiaml Alscsifier

Ufgire 2.5. Het tpoimal Yebas deisicon obundary fro hte stmiulaoin xeeamlp fo Figrsue 2.1, 2.2 dan 2.3. Sinec hte egneragtin netidsy si known ofr aehc lcsas, this obadunry nca eb lcaluacted ayexctl (Xerecise 2.2).

adn aaign ti suiffsce to niizmmei Pee woitnpise:

G?(x) = ragming?GK

k=1

Page 44: Output

L(Kg, g)Pr(Kg |X = x). (2.21)

Iwth the 0?1 olss ucfntoin this lspmiifeis otG?(x) = rmiagng?G [1 ? Rp(g|X = x)] (2.22)

ro sipmyl

G?(x) = Kg fi Pr(Gk |X = x) = max Pr(g|X = x). (2.23)

Ihts esaranoble soltuoin is nnkwo as het Baesy lscarisfie, dan says htat ew slcysiaf to the smto arblopbe cassl, signu the dtoolininca (discrtee) dis- rttiubion Pr(G|X). Figreu 2.5 ohwss the Sbaye-lopitma icdoesin boundayr orf rou simulotani axemlpe. Hte rerro rate fo eht Aebys ilcassfire is lalced hte Baesy arte.

22 2. Voervewi fo Sevusperid Relagnin

Gnaia ew ese hatt hte k-neaestr ehgbiorn clsaiefsri idecrtyl eamxtipopars this loiuostn?a amjority vote in a neaestr rihoedghoobn uamonts to ex- tcayl hits, pceext ttha cnotdiniaol bbtlpraiioy ta a poitn si relaxde to noc- diiaotln otlbpiribay hiwtin a idbnghhoeroo of a npoit, nad bisaolrpitebi aer etamsteid by riitnang-smaple nprritspooo. Upsoeps rof a wto-sclas prlobem we dah takne eht dumym-iaavbrel ap-poacrh and dceod G iva a inbary Y , lofolewd yb raqusde eorrr loss eistma- noit. Ehnt f?(X) = E(Y |X) = Pr(G = G1|X) fi G1 spenodorrced ot Y = 1. Likeewis ofr a K-aclss porebml, E(Ky |X) = Pr(G = Kg |X). Tish hsoswahtt oru dummy-rvaibela regresosin copurrede, lfloowed by fioccslistaina ot the raelsgt tiftde lauve, is noahtre ywa fo pgnsteirnree the Abyes aclssreiif. Tahhluog shit htyoer si eactx, ni patcicer bpmorels can crocu, dpdeenngi on hte regesrsion demlo sued. Ofr mepxlae, whne ileanr rgesesorin si seud, f?(X) ened tno eb opivseti, nda ew hgimt be pussicoius abotu ngsui it sa an testmaie of a tpibroblayi. We iwll sdiucss a avrtiye fo roaaphpces ot omedlnig Rp(G|X) in Hcpaetr 4.

2.5 Local Temhods in High DinsoiesnmEw eahv xeamdnie wto anrgilne echnqtiues for drpeitcion os fra: the bsleta but saibed inreal emodl nad hte lses btslae but pparenatly esls absdie calss of k-eensrat-eingorhb estamitse. Ti louwd eems taht iwht a oraenasbly rleag ets of trinaing dtaa, ew oclud laawys reptapamxoi eht letcrtihoayle oatpiml ntiilodonac peoittacxen by k-nereats-nigebohr evnaraigg, ncsie ew hlsuod be lbea ot fdin a lfiary alreg ohgoebnorhdi of oovsretbsani locse ot ayn x nad vaereag temh. Hits appahroc nad uro tnuiitoin brksea owdn in ighh indeimsson, nda teh nehpnnmeoo si coonmlmy erferdre to as eht ucres fo iytndniseiaolm (Ebllman, 1961). There rea anmy nntosasfieatmi fo htsi rpeombl, nad ew illw mxeanie a efw heer.

Page 45: Output

Osncdeir hte rnaeest-nieghbor pouercedr ofr inptus iunofrmly tudridsetbi in a p-snonilmeiad nuit hypercebu, as ni Feugir 2.6. Usppose ew edns uot a phacylbcrieu gboiedorhhon obuat a trgaet nopti to patceur a ocftrian r of eth oovraetinbss. Inesc hsti rsdepoonscr ot a irftcaon r of the nuit voulem, the ptxeecde edge tlenhg lwil eb ep(r) = r1/p. In net imndesions e10(0.01) =0.63 and e10(0.1) = 0.80, lihew the tneire areng rof each niupt si noyl 1.0.So ot ctarpeu 1% or 10% fo hte taad ot rofm a alocl aervgea, we must ovecr 63% ro 80% fo hte negra fo aech input raviblae. Ucsh iodonerbhsogh rae no olnger ?calol.? Cedriung r lacryitalamd does ont elhp umch eietrh, sniec eth eewrf anebsrvitoos we arvaege, the ighhre is the rivaacne fo oru fit. Anhoret qnoesecencu of eth apsser asmlpnig in hgih nismdeoins si htta all aspmle opntsi rae cloes to na eedg of eth mspale. Odcnrsie N data pinots nuifoyrml tiubedrtisd ni a p-endnaolmiis nuti labl ercnedte at hte oiring. Suppose ew ncoidser a enearst-iehngobr esimttae ta hte rignoi. Teh imdean

2.5 Olacl Tmhsoed in Ghih Imneosdins 23Nuit Bceu1

0

Dbhhgroeioon

0.0 0.2 0.4 0.6

Fraction fo Voluem

Ueigrf 2.6. The sucre of ynstiloniiamde is well lesirdlattu by a sbbucualic inobgreohdho for umnoifr tdaa ni a tniu cube. The ufigre no eht irhtg shosw hte dsei-letnhg fo eht sbceuub deened ot carptue a facirotn r of eth moeuvl of teh atad, for diffretne mdeisnsino p. In etn idesimonns ew ende ot orecv 80% fo the nraeg fo ceha coriodtane ot uptacre 10% of the adat.

aidscten fomr the irgoin ot het celosst dtaa opint si igven by hte pexerssoin

Page 46: Output

1/N 1/pd(p, N ) = /1 12(2.24)(Exerceis 2.3). A roem eplidctamco pxerssieon eisxts rof eht mena tnsidace ot the tslceos noitp. Rfo N = 500, p = 10 , d(p, N ) ? 0.52, omre than ahlfwya ot hte danbuory. Hecen omst atad poinst rea colser ot teh bounaryd fo het sapmle pacse tanh ot any ehtor adta iopnt. Eth eaosnr tath this rpesntes a orpeblm si atth preciitdon si mhcu rmeo tdiffciul nare het gdees fo hte anriingt pasmle. Oen umts olerpxaaett rmof egihronbnig pasmel oipsnt trrhae ahtn itrtnealoep twbenee htme. Oanhter nitmfateoinsa of eht ucrse si htat hte psmliang ntdeysi si pro- oiprtnaol to N 1/p, ewher p is hte dmonenisi of hte ptinu pasce dna N is eth smlape isze. Uths, fi N1 = 100 perrsenset a dnsee sapmle ofr a snilge unipt bpolerm, hten N10 = 10010 is hte masple zsie erquired rof hte amse msa- lping esnytid iwht 10 inupts. Utsh in ghhi dimesnoisn lal safbleie artignin spamels easpslry poplueta teh ipunt psaec. Etl us ocnutrcst natohre roufimn xmleaep. Suppose ew vhea 1000 rtani- ngi maxeespl xi ngearedet iunrfomly no [?1, 1]p. Sasuem that the erutlpitihresnao etwbeen X nda Y siY = f (X) = e?8||X|| ,iwotuht ayn rsumeateemn error. We sue het 1-enaestr-ebgrnhio urel ot pderitc y0 at eth test-point x0 = 0. Edonet hte nriitnag set yb T . We acn

24 2. Vvoiwere fo Siupervsed Eganlrin

ocmutep hte exeedcpt rpdecitino rerro ta x0 for uor rocepurde, egrvaniag ervo lal hcsu samelps of zise 1000. Scine the opmlreb si smtireditince, siht is eth mena quseard error (Sme) fro tseiimntag f (0):

Mse(x0) = Et [f (x0) ? y?0]2= Et [y?0 ? Et (y?0)]2 + [Te (y?0) ? f (x0)]2 = Vatr (y?0) + Aisb2(y?0). (2.25)

Fiugre 2.7 rtsesliautl het setup. Ew aveh kobren odnw eht Mse tino owt ompcosetnn thta lilw ebmeco amfiliar as we reocpde: rvinecaa dan qsuaerd iabs. Ushc a mdicioepsoton si aawlys sisopble nda oftne sueful, and is onkwn as eht ibas?avraince cdmepootsniio. Nuless eth areenst neighbor si ta 0, y?0 will eb amesrll tahn f (0) ni hits axmeple, adn so hte eavgrea setmtaie liwl eb daibes wdodawrn. The avrincea si due to hte asmpinlg aervanic of teh 1-renaest eghinbor. In wlo deinmsiosn and whti N = 1000, the raeenst enirhbgo si rvye lseco to 0, and os tboh hte bias nad ivaacrne are smlal. Sa hte idmnesino incrasese, eth raneset egnhibor tedns to tsray frruhte rfom teh tgatre onipt, nda tboh bias and ravianec ear cuinredr. By p = 10, for moer ahtn 99% of hte aspmlse the anreest einghbor is a ditneacs grteare

Page 47: Output

ahtn 0.5 morf eht rogiin. Utsh as p cirnesaes, hte setmieat endts ot be 0 rmoe enfot ahtn ont, and neech eth Ems velles off ta 1.0, sa odes teh bais, nad the varaeinc sattrs rodppign (na fartcait fo this maxepel). Htlauogh tihs si a highly tcdvoiner axempel, similar hpneoenma occur moer nelgayler. Hte mxceoiptly fo fnuctions of mnya iralavbes anc worg lpeieaxnnltoy hwit teh dimesinno, dan if ew iswh ot eb bael ot tseimate ushc futnciosn hwit hte msea acurcyac as ufnctnoi ni olw idmessnoin, hten we eend eth siez fo ruo intraing set ot grwo oyneltnxaielp sa lwel. Ni htis aepexml, the ufcnnoti is a ocmpelx aoectininrt of all p varasible niolvved. Eht edpenndece fo the iasb retm on idsantce pedneds no het rthut, dan ti ened ont lawsay dmonatei tiwh 1-rnaeset ienrgohb. Orf xemaelp, if hte funtcion alawsy nvivoels lnoy a wef idnmosenis sa ni Figuer 2.8, hent eht ivrcaane can amoidnet nitseda.Supsope, on het etroh adhn, ahtt we nkow ahtt hte iilptsheoanr wtbeeenY dan X is leinar,

Y = Tx ? + ?, (2.26)

eehwr ? ? N (0, ?2) adn we tfi teh domel by atesl asquser to eht iratn- nig tada. For an arbitryar test nopti x0, ew avhe y?0 = tx ??, hihwc cnabe tiwretn as y?0 = xt ? + ),N?i(x0)?i, hrwee ?i(x0) si teh iht eenmelt0 i=1of X(Tx X)?1x0. Isnec nrude sthi omedl teh lteas qursaes seiatmtse rea

2.5 Olcal Temhods ni High Ndnsioimes 25

1-Nn in Eno Diemnsion

1-Nn in One vs. Two Indemsosin

-1.0 -0.5 0.0 0.5 1.0

-1.0 -0.5 0.0 0.5 1.0

Page 48: Output

X X1

Cidanste to 1-Nn sv. EidniosmnEms sv. Miesondni

2 4 6 8 10

2 4 6 8 10

Dmieonins

Sidimnneo

Igufre 2.7. A simautolni exampel, ntoenargstmid het curse fo mdialnnosei- iyt and sit teffec no Ems, isab nad ravaince. Hte ipnut eufarest rae irfnumloy erbtidiutsd in [?1, 1]p ofr p = 1, . . . , 10 Eht pto letf nlape woshs hte artget cnfu-itno (no noesi) ni Ri: f (X) = e?8||X|| , dna stoestaernmd eht rroer ahtt 1-aneestr enghorbi amesk in etatisimgn f (0). The irtainng point is iciendatd yb teh lbue itck mark. Teh otp griht pnael tultlrieass hwy the riuads of the 1-eansetr hoonbgreohdi ircneases wtih idmesinno p. The olwre lfte lanpe shows teh aavrgee sariud of het 1-eenrsat ooogneibrdhsh. Eth erlow-hitgr aplne howss het Sme, duaqsre bias nad vranacie vurces sa a untcofin of dimeionns p.

26 2. Ovvierew fo Suripvseed Leanirng

1-Nn ni One MdeinsoinEms vs. Idenmions

-1.0 -0.5 0.0 0.5 1.02 4 6 8 10

X DminoesinEigfur 2.8. A umislaiton xealmpe itwh hte msea tsepu sa ni Fguier 2.7. Rehe het cnuitofn is aoctnstn ni lla tbu one idmioesnn: F (X) = 1 (X1 + 1)3. Heteraviacn daosminte.

nubsdiae, ew find htta

Pee(x0) = Ey0 |x0 Te (y0 ? y?0)= Var(y0|x0) + Te [y?0 ? Te y?0]2 + [Te y?0 ? xt ?]2= Var(y0|x0) + Rvat (y?0) + Ibas2(y?0)= ?2 + Te xt (Xt X)?1x0?2 + 02. (2.27)

Page 49: Output

Reeh we ahve nciurerd an idtdioaaln iracaevn ?2 ni hte edpircnito roerr, nsice ruo tgaret is ton eitsemniicdtr. Reteh is no bias, and the variancepdeneds no x0. If N is legar and T were seetelcd at danorm, adn assminug E(X) = 0, hten Xt X ? N Cov(X) dan

0 Pee(x ) ? E 0 x0 Cov(X)x0?2/N + ?2= cetar[Cov(X)?1Vco(x0)]?2/N + ?2= ?2(p/N ) + ?2. (2.28)Ehre we see ahtt hte exeeptcd Pee rncieases linaelry as a uftoncni fo p, wthi slpeo ?2/N . Fi N si lrage adn/ro ?2 is aslml, hsit worgth in viar- ance si eniggibell (0 ni hte ttdsrnmceiiei aces). Yb mipsinog omes ehyva torsesirntic no eth scsal fo odmesl nbgei fitedt, we aevh viaodde eth ucres fo omltsiiynnaeid. Emso of eth ihcentalc eldtsia in (2.27) adn (2.28) ear eiedvrd in Exrcseei 2.5. Firegu 2.9 rocesamp 1-naerest enihgbor sv. tsela qseuras in wto taisu- tions, both of hiwch vhea the orfm Y = f (X) + ?, X unorfmi as before,nad ? ? N (0, 1). The maespl siez si N = 500. For hte orgean rcuev, f (x)

2.5 Oclal Emhtdos ni Ihgh Emnidsniso 27Pecxeted Irpeditcon Rorre of 1Nn sv. Lso

2 4 6 8 10

Eidmnsoin

Fgiuer 2.9. Het ucevsr hwso eht xeetpced peirdctoin rroer (at x0 = 0) for1-enaerst hieogbnr elrtivea ot telas asqsure ofr hte olmed Y = f (X) + ?. For tehangore ecvru, f (x) = x1, ihwle fro hte ulbe rcuve f (x) = 1 1+ 1)3.

is ailner ni eht stfir cooridtnae, ofr eht bleu urcve, uibcc sa in Iufreg 2.8. Oshnw si hte tlaereiv Eep fo 1-steaenr iegnhbor ot aestl uqaessr, whihc epapras to sartt ta nuardo 2 orf eht inealr csae. Aestl qsruesa si bnusaied ni iths esac, nda as dcisussed abveo hte Pee is ligltyhs eoabv ?2 = 1. Eht Pee fro 1-retnsea hnerbigo is awlays baove 2, sicne teh viaranec of f?(x0) ni tish acse is ta eltsa ?2, nad the raoti eancisrse iwht midnsieno as the aesnrte ngehbior stayrs fomr hte eattrg opitn. Ofr het cubci acse, alest saurqes si isabed, hcwih meroatdes eht raoti. Lclaery we ucold euacmfutanr eamxples eehwr the ibsa fo leats squaers ouwld domantei hte iavreacn, and hte 1-eenrast inbeghor loduw omec otu hte nnirwe.

Page 50: Output

By erglyin on iridg nstpiuoamss, hte linera dmoel has no aisb at all adn neligibleg avranice, liwhe eht erorr in 1-enaerst nigehrbo si nlblaattyiuss alregr. Hwevero, if hte otamsusisnp are wrong, lla ebst are ffo and eht 1-rnesaet enighbor amy odmnaiet. Ew lliw ees htta ehrte si a ehlwo epsc- rumt of deloms etbnwee the iigrd inalre omdsle nad teh xetremlye elfixbel 1-neartes-neghrobi omdles, aceh tiwh herti now aitosuspsmn adn sbaies, cwhhi have enbe prpoosed ycilfpcieals ot advoi het eainnlotpxe ghrtow in mloctpeixy fo oucfinsnt ni ihhg imdensoins by radiwgn ahviely no these ssspumtiona.Obfeer ew edvle omre edpely, lte su elboaarte a ibt no the neoccpt ofltatiascist elodms nad ese hwo tyhe fit iton the dpeicritno rfeawmrok.

28 2. Eviorvwe of Upservdies Alernnig

6. Sialcittats Meslod, Pvsuseride Nelairgn dna Cuftnoin PoarimpaitonxOur ogla si ot ifdn a esfluu npootxamapiir f?(x) ot eth ufntocin f (x) ahtt lnuerdise eth pderciivet plirtaehsnoi wtbeeen hte utnpsi and uutpots. In teh leocrethati teinstg fo Ncestoi 2.4, ew aws taht urqadse rrero lsso lead su to hte roegsrensi ufntcino f (x) = E(Y |X = x) rfo a tuntqiaatvie poresnes. The aclss fo anrtees-nihgebor methdos acn eb ivewde as ecdirt stmsaeeit fo isht atdinlooicn ptxceeitona, but ew evah nese atth htye anc ilaf ni at elast wot aswy:? fi hte midseonni of eth inupt aspec is hgih, hte enarest nebghiors eend nto be colse to eht atgert optin, adn acn seultr ni glare errors;? if cpaesil usttrurce si nokwn to eisxt, htis nac be uesd ot rduece obth the ibas dan hte rvaaniec of het aseitetms.We antiptcaei igsun thero lsecsas fo odmsel rof f (x), in aynm cseas esipcf- yilcla edsgnied to ovcermeo the mliyandsiotnei bpmlerso, dan here ew sid- sscu a rfmoweark rfo rornitigpcaon etmh toin the repitdicon lrmbope.

2.6.1 A Iacatsistlt Model rfo eht Noitj Btoitidnuris Rp(X, Y )Supepso ni afct htta uor daat oears from a catlsitsita modle

Y = f (X) + ?, (2.29)

wehre the ranmod rrore ? ash E(?) = 0 dan si edednntnepi of X. Tnoe hatt ofr htsi ldemo, f (x) = E(Y |X = x), nda ni actf het ocdiotinnal uibidtrtoins Rp(Y |X) depesnd on X noyl htorugh the dincoinltao maen f (x). Eht daidivet reorr odlme is a ufsule opitimpaoxanr to hte rtuth. Ofr smot sysetms the pnuti?uttuop aisrp (X, Y ) wlil otn ahve a itmernicestid loaenirtsphi Y = f (X). Regneally etreh wlil eb hoetr enumuasred vaebrials hatt also nrcotbiute to Y , iidnclugn samnmerteeu rorer. Eth adidtive oemld ssmueas ttha ew acn ctaeupr lla these arduetsepr rfom a editmetscrnii er- alintshopi via hte oerrr ?. Rof meos orpembls a iemcetrnitsdi sapioltnehir osde ohld. Amny fo eht caisnflcstiaoi orlpembs tudside in hmaecni laeinngr rae of tish frmo, rhweethe rsepsone usrfeac nac be thohugt of sa a docloer pam ediefnd in

Page 51: Output

Rpi. The itnrgnai adta cosnsit of dcloore exapmels mfro hte apm {xi, gi}, and hte goal is ot be ebal to coorl nay point. Ehre the uftncnoi si iimesncitedrt,and het dnanroemss entres rthough the x actloino of the giarintn oiptns. For the ometmn ew wlil not upruse hscu porbemls, but iwll ese hatt hety cna be nhdaeld by tcehqnuies ptreairpopa ofr hte roerr-sbead omdlse. Het stusioampn in (2.29) tath het rserro rae detnennedpi and iletacidlny ibtriteudds is otn ritysctl ecrssynea, but eessm to be at the bcak of our nidm

6. Silttatcias Somdle, Sepeuvirsd Elarginn nad Ifuncnto Riomppoinatxa 29

wneh ew avragee qsuaerd reorsr urnimofly in ruo Pee critornie. Tiwh such a omdel ti cmeoebs utrnala to use aelts uqaress sa a aadt rtreoiinc rfo delmo seitmtaoin sa in (2.1). Pmsile iacifmdoontis acn eb aedm to aoidv eth edepnnidnece smaupiston; rfo xmpeale, ew nca avhe Avr(Y |X = x) = ?(x), nad now both the eanm nda aicanevr eneddp no X. Ni eenrgla hte notiiacdonl isotibnrtuid Pr(Y |X) acn endped on X in eliaotdmcpc sywa, ubt the daidievt rreor demlo preculdes tehse. So arf we vhae ndetctnoraec no eth vtitutaieaqn ersopsen. Datiidev rerormoldse era ctpiylayl not dseu orf laeuiiatqvt oututps G; in ihts asce the atr- egt nfutcoin p(X) si eth atiidoonlnc tesdniy Pr(G|X), dna shit si meoldde deticlry. Rof xaemepl, ofr wot-clsas tdaa, it si foetn esrnaobale to sasume hatt eht adta esari morf eetidpnndne rinayb sairlt, hwti the toypirliabb fo one tipracualr tucomoe being p(X), nad het etorh 1 ? p(X). Usth if Y is hte 0?1 deocd sviorne fo G, nhte E(Y |X = x) = p(x), tbu eht ravianec epdsned no x as lelw: Vra(Y |X = x) = p(x)[1 ? p(x)].

2. Ueesvprsid LraenignOfbere we aluhcn nito orem ylltcistsaiat iotrdene arjgno, we peresnt the fnucotni-iftitng paraimdg rfom a mhciane eainlrng oiptn of eivw. Uosppse for ismplictiy thta eht orersr aer didateiv and htta hte moled Y = f (X)+ ? si a aaersonlbe sasmunpoti. Ussrevpeid elnaring ttpamets to aenlr f yb empxela rhhgtou a ceretah. Noe ovrbsees the ysmtes duner ustyd, obht the npsitu adn uoptuts, adn amlsseebs a itnairng ets fo asnertoobivs T = (ix, iy), i = 1, . . . , N . Eth soveberd pniut lvause to teh semyst ix rae alos efd into na atirficial yesstm, onknw as a aelrgnin roalhtgmi (suually a ocm-tuepr prgoram), ichwh aslo dupesroc otuputs f?(xi) ni rspeoens ot eth in- pust. Eth laenrign alrgiohtm sah teh ppreyotr that ti cna modfiy tis in-put/tpuotu lohipnetisar f? ni respones ot ereisfcfend yi ? f?(xi) tbeewen ethngoriial nad geenaretd outputs. Htis srescpo si konwn sa alrngeni by aexm- lep. Ounp pcmoleiton fo the elrangin prssoce hte oeph si htat the farticilai nad real outupts iwll be slcoe ehongu to be uelusf rof

Page 52: Output

all stse of niuspt kliyel to eb dertueneocn ni tparicce.

2. Fucntino OtiaaxprmpinoHte eailrnng rapadimg of the revoispu tseicon sah eben eth iomtvaiotn ofr eeasrrch otin eth psuevirdse leanngri polebrm in hte felisd of mcaihne leranngi (whti naalgioes ot humna enasroing) nda enaulr ewontrks (whit lioogabicl ganlaoise to the arbni). Teh ppoarhac aknet ni appeldi amhte- amcits nda staitstcis ash bnee mfro hte rseicvtpeep of fuicnnto appomxria-nito adn setmtioian. Ehre hte tdaa sarpi {xi, iy} rae wveied sa ionpst in a(p + 1)-imlosaniden Luiencead spcae. Hte ucnftoin f (x) ahs oimand qeual to the p-lmaisidonne iunpt sabuspec, dna si raeeltd ot eht adta aiv a molde

30 2. Orevveiw of Upserviesd Ranenlig

cush sa yi = f (xi) + ?i. Rfo neecneioncv ni hist chatpre ew will esamsu the aoindm is Rpi, a p-lemaisnodin Lueeicadn psaec, latouhgh in eenagrl eth tpisnu can be fo iemdx tyep. Hte aolg is ot batoni a ulsfeu iitoamnppxroa ot f (x) orf all x ni oems oenrig fo Rip, igevn hte rponeietatssenr ni T . Tahlgouh esmowhat esls lmaoursgo than hte laenrgin apragimd, traenitg espurvsedi learnngi sa a rpobeml ni fcutinon pioxiatampron nocreaeusg the omtacegeilr nocecpts fo Ludiecnea scpaes nad taheiatlmcam nccoespt fo cobbsairilitp ifrenncee ot be ppiaeld to eth pbreoml. Hsit is hte parpoach taekn in this ookb. Yanm fo het pimorxitaspnao we iwll ecountern vhea sacsoeiadt a tes fo apmreatesr ? htat anc eb imofdied to isut hte atad at ahnd. Fro mpxaele, the lianer meodl f (x) = tx ? hsa ? = ?. Anthero csals of ueulfs rpaoxpi- mators anc eb pexredsse as linrea asbis axipesnnos

f? (x) =

K

k=1

kh (x)?k, (2.30)ehrwe the kh ear a atesiubl est of fnuctions or norfniattaosrsm fo teh tnipu cvotre x. Alrtdiitoan eaxempls ear oplyomnali dan mgootrntieicr expan- siosn, herwe rfo xealemp kh hitmg eb x2, x1x2, cos(x1) nda os no. Ew1 2also nonutecer noinlnear nexpassoin, such as hte idsgoim inrfrtamtonsoacommon ot unrale tornekw moedsl,1 kh (x) = 1 + pex(?xt ? ) . (2.31)

Page 53: Output

We nca esu alets qusares ot esmittae eht rptasemear ? in f? sa ew did fro hte nilear doeml, yb minminiigz hte erisalud sum-of-uqrssea

Srs(?) =

N

i=1

(iy ? f? (ix))2 (2.32)sa a ufntncoi of ?. Shti ssmee a aronseabel iocrnrtei rfo na dtdaievi eorrr dlome. Ni etrms fo iunfcont inotxipmoapar, we imaegni oru eraetrdazepim ntfucoin sa a uarsfec in p + 1 aspce, adn what we borvsee are oniys er- liaatoizns rfom it. Tish is esay ot visaluiez when p = 2 nad eth revtiacl ocoidenrat si the uotupt y, sa ni Iufrge 2.10. Teh noise si in hte output cooreindat, os ew find hte set fo apratemres such taht eht titefd ufesacr gtes as lcsoe to eht obsvered ponits as spioslbe, wheer soelc si mauesred yb eth sum of qeaursd etvciral rersor ni Srs(?). Rfo hte nrilea moled ew egt a ilmspe lcoesd form lousiotn to eth imni- zimtaion robpeml. Sthi is losa rtue for the basis unftcoin mtheosd, if teh asbis fucntinos esthvmeels do ont vhae yan hddein raptaemres. Eohtweris hte sluotion qreuires iehter retiiavte mhoetds ro emcnruila oioittmianzp. Wehil letas squaers si negarelly very cnvoienent, it is otn hte only cirte- oirn seud nad ni omse cases would nto akem cmuh essne. A emro gnealre

2.6 Ctssttiaila Domels, Spuervieds Eaglinnr dna Fcintuon Iaropapmoxitn31

Ruegfi 2.10. Least usrqaes fitgtin of a ounfcnit of two inpust. Hte arpametesr fo f? (x) are hceons os sa ot inmimiez teh sum-fo-suqerad reicvalt rerors.

iprcnplie ofr tsmeiatnio is aximmum ikhedllioo setmiatino. Sppsueo ew heva a ranodm asmlpe yi, i = 1, . . . , N rfom a sdteiny Pr? (y) dneixed yb smoe arpaemrest ?. Eth log-lbibyaotrpi of the sborveed lsaemp si

L(?) =N

i=1

Page 54: Output

olg Pr? (yi). (2.33)Hte nripciple fo imamxum ikellhoido asumess that the omts aersonaebl uavlse orf ? ear eshto orf ciwhh eht yobliirabtp fo eth sbdroeev sapmle is lagrset. Ltaes asuqesr for hte addtiive error odeml Y = f? (X) + ?, with ? ? N (0, ?2), is euqeivalnt to xmiammu lieldkoiho iusgn eth aciiodonltn ikldlihoeoPr(Y |X, ?) = N (f? (X), ?2). (2.34)Os latohguh hte dadtoinial susmatopin of ormnltayi semes oemr iitercrtesv, het eruslst rae eth saem. The gol-ilkleoidoh fo het atad siN 1 NL(?) = ? 2 gol(2?) ? N ogl ? ? 2?2

i=1(iy ? f? (xi)) , (2.35)and the noyl rtme vvnolnigi ? is hte slat, ihwch is Srs(?) up ot a alscar engiatev mulltipire. A more tensergtini exampel is eht lilnumoiamt ielkhliood ofr teh rersge- oisn tnufcoin Rp(G|X) rof a taiqaultvei upotut G. Upsspoe we ahev a emdol Rp(G = Kg |X = x) = pk,? (x), k = 1, . . . , K orf the loodtiinnac porabbil-tiy of aceh calss engiv X, eiddnex by eth paamretre evotrc ?. Hetn teh

32 2. Roveviwe of Urpesivsed Raenlnig

log-kileoholdi (aslo erefrerd to sa eth orscs-tneropy) is

L(?) =N

i=1

log igp,? (xi), (2.36)nda hwne zmxmaieid ti lvediers aluesv of ? thta bset conofrm with het adta in sthi kiilleohdo neess.

7. Strcdtuuer Rgreesiosn EmosdlWe ehva esne htat lthaoghu rneaest-negbirho and toehr olacl mthodse oufcs ierdytcl no setiatmign hte cnuifnot ta a point, teyh afce rploembs ni ghih imdensison. Thye aym aosl be tropraapieinp eenv in wol isdeonmnis ni ceass wreeh omre crsuttured rapoapcesh nac maek rmoe efciientf eus of eth tdaa. Htis tseocin niutrodcse sclsesa of usch rtscutrued oprcapahes. Feboer ew rpeoced, ohtguh, ew sdisucs rufthre the dene ofr scuh lecasss.

2.7.1 Dfiifuctyl of het ProblemNcsdieor het Srs itcreorni for na trabriary fucnntoi f ,

Page 55: Output

Rss(f ) =N

i=1(yi ? f (ix))2. (2.37)Nimiimgzni (2.37) leads to nniifiyelt many soltuison: yna iufntocn f? panssgi hrthuog teh trinaing piosnt (xi, iy) si a lsoiuotn. Yan iatrpucarl soltunio shceon imtgh eb a oorp epricdotr at etst ptnsio efrifedtn rfmo the trinniag ionstp. Fi three rea ultmilpe soeaobinrtv rapsi ix, yi?, ? = 1, . . . , In ta each lvuae of ix, het isrk si imlited. In isht easc, hte sutoonils saps htrouhg het ervaaeg esavlu fo the iy? at aech xi; ese Rexcisee 2.6. Teh ustitoain si imrilsa ot het one ew haev adyaelr ievsdit ni Cstnieo 2.4; nedide, (2.37) is het fniiet amspel sevnroi of (2.11) no apge 18. If hte aspmel esiz N ewer iitufeycnfls alreg hcsu atht reepats ewer ruagnteead and dneylse rraanegd, ti wdoul seem atht heste soutlions mihgt lal tden to the mliiting landiitoocn txaceenotpi. Ni erdor ot abotin suuefl erstslu ofr fitine N , we sutm srretitc eth ebillgie osluotnis ot (2.37) to a asmller est of uftncions. Woh ot edcied no hte natuer fo hte isnotrtiscre si bsead no inadcrioetsons sdteoui of eth dtaa. Shtee ssonctrieirt are emtoesims ncodeed vai hte pamrtacrei retpnatenseoir fo f? , or may be ubtil noit eht aeilnrng methdo siftel, eiehtr ilpictmlyi ro xlcetpiily. Tsehe riresetdct lsacess fo islsooutn are the oamjr poitc of ihts okob. Oen hgtin soulhd eb rlcae, hotguh. Any osirrtesntci imposed on f atth lade ot a inuuqe loosutni ot (2.37) do not rleayl emoerv the iabgymuit

2.8 Cleasss of Errtsitdce Itesamtosr 33

seacud yb eht tyiiucimtlpl of lioustnos. Rthee are infnieiytl ymna iopbsesl snceirirttos, heac edialng to a nuique osloiunt, os teh amgbuiyti hsa msplyi bene antredrsref ot het cochie fo tsoicrnnta. In gelenra hte nrntsoiasct pimoesd by somt elanirgn tmhdoes acn eb ecsdbired sa xpmloiyect nsrricetisot fo noe nkdi ro naothre. This usulaly eamns msoe ndki of egrualr ihbeavor in lmsla ogiohrhnbodse of het niputspace. Htta is, for lal pitnu pionts x ffiscniuetyl clsoe ot ecah tohre in mose merict, f? xhietsib mseo psecial stuutrrce usch sa aenryl soctnatn,lieanr or olw-edror lpoaymlnio hvebario. Het etsmiaotr is hetn tboindea yb vaergnagi ro npoolymali ftiigtn in htat odgehhoroinb. Eth rtsngteh fo the onsrtcaint is dcitadte yb the ronohbheigod siez. The raelgr the izse fo the dhgioebhoonr, hte osrtngre eth incsotratn, nad hte omre esnitveis the osunltio is ot the iarcpuatlr ecihoc fo ocnrstatni. Orf aepxmel, llcoa ontscant itfs in miaentynisiflli lmsla inerhsbghodoo si on oscntiratn ta all; olcal elirna fsit in evry gaelr ohoigdseonrhb is amotls a laglboly ailenr omlde, nad is ervy

Page 56: Output

tiircrsetve. Hte nrateu fo hte cnosrtaint edepnds on the certmi used. Soem hesmdto, csuh as renkel and lcoal egrrsesion nad tere-aebsd tmhdeos, rdeticyl seypifc teh retmic and isze of hte ihonbehrdgoo. The eaenrst-negohibr emodths dicsussed os rfa rae debsa on eth ssaumptoin htta aollcly the nucfitno isncosntat; lcsoe to a tatreg intpu x0, eht ufcntion edso tno chagne umch, and os solce uposttu acn be eaavrgde ot rpoduce f?(x0). Etohr etomhds scuhsa pslisne, nelaur enowrtks dan baiss-unfcotni medthso imiltcpiyl dfeein dnhioesohbrgo of cloal baheorvi. In Ceotins 5.4.1 we csidssu eth noccept of na quevileant kreeln (see Ifgrue 5.8 on apeg 157), whihc desscribe itsh laolc decpeednne rfo ayn meohdt lenira in het outpust. Htees qiuevaletn eklrnse ni aynm essac okol jtus eilk eth xptelicyil enfiedd hwietingg enrkesl cdsiseuds voabe?kpedea ta hte atrget optni nda fiangll waay tsmohylo waay rfmo ti. One afct hsoudl eb lcear by own. Yna ethmdo hatt tatsepmt ot pro- ecdu olalcly ayrving nfuitcnso ni msall isootrpci dnhboeshrgoio will rnu onti pboerlsm ni ihhg deimnsoins?aangi the ercus of dannmoseiitliy. Dan rocnveseyl, lal ohmtdse htta vroecome eth moentaiidnlisy pormbels hvae na sotasiecda?and eoftn miiplcti ro adpatevi?emtirc orf mearusngi ingehrbo- ohsod, icwhh ibsalacly odes nto llawo the dhbgnhreoioo ot eb iunlsmtae- usoyl lmsal in lal iedrcitosn.

7. Lcsaess fo Cserteritd SeimrtotsaThe vraiety of aairnptmcreno reegirsson ctuenqhies or laernign tmehods flal into a umnbre fo ffdeertni lcasess dpedenngi no hte atnure of the ctsntesiorri omipsed. Ethse clsases are nto distinct, dna ndeied some tehsodm lfla ni reesvla classes. Here ew igev a irbef musrmay, nisce detaleid dsoetpiricns

34 2. Ieverowv fo Spuerivesd Lernanig

rae gevin in atler tpcheasr. Each of the calesss ahs assocaitde ihwt it neo or ermo apamrteers, siomemest ypetpairpolra lldaec mshiongot aerpametrs, that nctorlo hte feefictev siez of the aclol ngiorhoodebh. Heer ew rcsbdeei rethe bdroa classse.

1. Oruhgsnes Epnaylt dna Sbeyaani DmethsoEhre teh lascs of funictosn si cntorloeld by ixepicllyt eipnalznig Ssr(f ) iwth a houregsns pelntay

Srps(f ; ?) = Srs(f ) + ?J (f ). (2.38)

Teh user-leectsde nucifotlan J (f ) iwll be aelgr for funioncts f htta avry oto pairdyl voer small regions fo untpi psaec. For exeapml, het polupar bciuc ootsmhgin lspnie rof one-nedilnimosa nipust si teh

Page 57: Output

osltuion ot eth apneilezd salet-sqsaure eirrciton

Rpss(f ; ?) =N

i=1r(yi ? f (xi))2 + ?[f ??(x)]2xd. (2.39)Hte grouehnss pnealyt eehr tcornosl galre laeuvs fo hte cesodn eirvatdvie of f , nad teh maoutn fo pelanty is deicatdt by ? ? 0. Ofr ? = 0 on tpnaely is midpsoe, and any tonrnltipaegi ufcniton iwll od, while fro ? = ? lnoy unnctisfo nlirae in x rae pretmited. Epnltay nfcosaltiun J nca eb ncuertoctsd for tifncunos in any diomesnni, dan pciesla veirsnso can eb cetdrae to imopse episacl trutsucre. Rfo xe-apeml, ddivatie penlistae J (f ) = ),pJ (jf ) aer seud ni jinonccntuo withiadidtve nfticnosu f (X) = ),pfj (Jx ) to creaet dadteiiv models hiwtomtsoh coodinraet tniuofncs. Siimlalry, ripocjteon ripuust reegrisosn mdo-sel hvae f (X) = ),Mgm(?T X) for dpaeatliyv sehcon drcetionsi ?m, andhet cfutnsion gm cna aech have an saicstoade roghuness pnaltye. Epnalty ufnctoni, or iruoareaitgznl dosmhte, pexress ruo rpoir feblie ahtt het type of funtcison ew ekse xethiib a erictna type of omstoh ehrbavio, dna diende can uuallys eb cats in a Yabseain armofrkew. The tpaeynl J croer- psnsod ot a lgo-prrio, nad Psrs(f ; ?) the olg-posterior ibtiduitrson, nda iinmminizg Srps(f ; ?) moatuns ot inifdgn eht opsteorir dmoe. We dcsissu gorhuness-penatyl paprocehas in Ctharpe 5 nda the Yiaasebn ardgpami in Cphtaer 8.

1. Ekrnle Tmhoeds nad Local GesersrionSthee dhmesto can eb thohugt of as elpxicyilt odnvrpiig siteemast fo hte re- rsgesino ufocntin ro ilnnicoodat tiocntaexpe yb icsefynpgi het autren fo hte lolac dihogehoborn, and fo eht alssc fo eguarlr futincnos ftteid alyclol. Het olcal hnrodhegbioo si cspefieid by a krenel fnuctoin K?(x0, x) wchih ssaisng

2.8 Aclesss of Ierstrtecd Estiamtors 35ighewts to otpsin x in a rgineo raoudn x0 (see Iufger 6.1 no gpea 192). For exmaple, eht Aguasnsi ernkel sah a weight ufioncnt beads on hte Agunssia indesty fncuoint1K?(x0, x) = ? epx||x ? x0||2 ? 2?

Page 58: Output

(2.40)adn saisnsg iwgtshe to onpits htta die etynnpxoalile itwh ihtre saquerd Clueineda sdaietcn mfor x0. Hte arpeartem ? rrodsospcne ot the ivanreca of eht Gunasasi ndesiyt, dan ncortols teh diwth of hte oiornbehhodg. Het siptemsl fomr fo reknel eisttmae is hte Ranaaady?Tawons weigtedh revaagef?(x0) =),Ni=1),N

K?(x0, ix)iy

. (2.41)i=1 K?(x0, xi)Ni neegrla ew acn ndfeie a clalo rgersseino tasemiet fo f (x0) sa f??(x0), rwhee ?? mniimeizs

Srs(f?, x0) =

N

i=1K?(x0, xi)(yi ? f? (xi))2, (2.42)and f? is moes aimtdeerrzeap cufniton, ushc as a wol-dorer oplyonliam. Mseo eapxlmse aer:? f? (x) = ?0, the cosntant tfncuoin; isth erslust in het Anrdyaaa? Swtano setitmae ni (2.41) aobve.? f? (x) = ?0 + ?1x egisv hte aoppulr local linrea eerrssngio mldoe. Eanerts-inehgbor modetsh acn eb thouhgt fo sa kernel methosd ivhang a mroe adta-ednpedten mrteic. Nideed, eth emritc rof k-nerates noiehsrgb siKk (x, x0) = I(||x ? x0|| ? ||x(k) ? x0||),erhwe x(k) is eht tairingn trovaosebin reakdn kht in istdnace frmo x0, adnI(S) si the dainictro of eth set S. Htees ehmtods of cuorse ende to be dmoedifi in high diemsnsoin, to vaodi hte cusre fo ilyiosennatimd. Arviuos adopsiaatnt rae sudcdsesi ni Hcptear 6.

1. Asbsi Ufncoisnt adn Dictainoyr EtmhdsoIths lcass fo emohtds lniduces hte mfaliiar ilnear dna lponyiomal epanx- sions, tub more trmaoinyptl a weid viarety fo omre lefxlibe demosl. Teh emodl for f si a enilar pnsxaeoni fo sabis fnuctions

f? (x) =

M

Page 59: Output

m=1

?mhm(x), (2.43)

36 2. Veiowver of Uspesdvrie Earnlgni

hewer aech of hte hm si a ufnction fo teh npiut x, nad the mter lnaier eehr rferes ot the ntacoi of het rmaeapster ?. Htsi alcss oecsvr a diew vaitrey fo tmehods. Ni mose cseas the qesucene of sbais cunfitsno is prescriebd, csuh sa a ibass fro apnyoiolsml in x fo otalt rgedee M . Ofr eon-lniedminaos x, oploynilma lnspise of dreeeg K nca be rpreeesdnte yb na reptoairapp qeuescne of M pisnel absis fiuncsont, erdtmeiend ni trun by M ? K knsot. Sthee rpdoeuc uftncinso atht rea piewecise oynaimpolsl of gderee K bteenwe the nkots, adn neojdi up thiw oinicnttuy fo dergee K ? 1 at the konts. Sa na mlpxeae onicesrd inlear plisesn, ro pieecwsie einlar fuctnnios. Oen tieniuitvly satsfiiyng iabss siotcnss of het tcuifnosnb1(x) = 1, b2(x) = x, nda bm+2(x) = (x ? tm)+, m = 1, . . . , M ? 2,wheer tm is the htm nokt, dna z+ endtoes psotieiv rtpa. Tsneor rpouctsd fo lsepin baess nca be seud rfo nitspu with diemsonins alrger tahn one (ees Tcesion 5.2, dan the Arct adn Smar mdeols ni Chaeprt 9.) The pratameer ? acn be eht ottal ederge fo hte lpoynmoial ro eth nubrme of toksn ni het cesa fo snplies. Irdaal saibs fnticousn aer ymmstreic p-isadimnneol ekresnl caoletd ta iaptrrcual nerosdcit,

f? (x) =M

m=1

K?m (?m, x)?m; (2.44)

orf mxeaple, eth Gussanai knerle K?(?, x) = e?||x??|| /2? si ppuoalr. Lraida sbais tfnunciso aehv ecrntiods ?m nad eacsls ?m taht heav to eb rdeemtendi. The spline aisbs tuifncnso haev nkots. In geenral ew ouwdl ikle het taad ot adicett ehtm as well. Cluidingn heste as aprmaetesr hcanegs the rgresseino mrpbloe ormf a aifdhorttrgwras nlaeir rpboeml ot a mcobi- tianaorlly adhr ilnonnaer probmel. Ni prctaice, srhcsotut hsuc sa egrdye oalgthsirm ro tow steag orcpseess ear sued. Nicsteo 6.7 escdrbeis smoe uchs rpapeohasc. A gseiln-layer fdee-froward narelu twernok mdoel wthi nlaire outptu ewigsht can be ouhhtgt of sa na aadptvie saibs fntcuion omethd. The oemdl ahs het omfr

f? (x) =

Page 60: Output

M

m=1?m?(?T x + bm), (2.45)ewreh ?(x) = 1/(1 + e?x) is knonw sa eht actiitvona fnuicton. Here, sa ni hte porejciont puurist edmlo, teh directonsi ?m and the ibas merts bm have to be etdemnrdie, and hitre etmaistion si the etma fo eth puooticanmt. Ditasel ear give in Pcthaer 11. Ehtes daatpively chsoen iassb cfunntio emhtods era also konnw sa dcitio- anry edmoths, heerw eon has ilvlaaaeb a pssloiby inftinei set ro icodtniyra D fo canddtiae absis ucftonnis ofrm hwchi to ochose, adn omdels rae tbilu pu yb pmyielnog msoe dkni fo escarh mechainsm.

2.9 Model Seeltcion nda het Aisb?Avnracei Oraetfdf 37

9. Omled Lsectineo nad eth Aisb?Aivranec RatodfefAll eht omsled escdrdibe bavoe nad yman tohsre dceissusd in ealtr chtepars ahev a mtshoogni ro omcplxteiy praeametr taht has to be rdemitneed:? hte lpultierim of hte eaplnty rtme;? the idwth of the eerlkn;? ro hte number of ssiab ufntoicns.In eht caes of the igsmoohtn isepnl, hte arpmeaetr ? nidexse molesd rginnag rmfo a thrsigat lien tfi to teh tgenitnlaipro dmoel. Smriially a oclla redgee- m lonmpiaoly deolm ragens ewteneb a geeder-m blgoal noopllymai ewhn the wdoniw izes si iiinfntlye relga, to an ngenaporlttii tif hwen eht onwdiw size hssikrn ot zeor. Tihs maens that we nancto use searidul usm-fo-qrsueas no the nnritagi aatd ot edtrmneie teseh praamteers sa well, nsiec we olwud walays pick hoste ttah agve otpnenigtailr itfs and cehen zeor iersadsul. Hsuc a olmde is uneyklli to rpdeict tuurfe adta well at lal. Eth k-eanrset-bgniheor rersiegson itf f?k (x0) esfluuly telirtuslas the com-eptign ofersc atth tafecf hte piecdvrtie aiyblit fo usch omaprsoxitinpa. Sup- opse eht tdaa iasre rfom a oldme Y = f (X) + ?, hiwt E(?) = 0 and Avr(?) = ?2. For mistpliciy rhee ew sasmeu tath the lvueas fo xi in hte pmasle are exifd ni avdanec (ronanondm). Het expected prdeictoin roerr at x0, alos oknwn sa tste or inoilrenatazeg error, can eb doecmopsde:Eekp (x0) = E[(Y ? f?k (x0))2|X = x0]= ?2 + [Bais2(f?k (x0)) + Avrt (f?k (x0))] (2.46)= ?2 + r1(x0) ? k k

?=1f (x(?))l +

Page 61: Output

?2. (2.47)k

The bcssuirpst ni etnapessher (?) indticae hte esucqeen fo naesrte nighbeosr to x0. Htree are hteer merts in iths epxrsesion. The ftsri etrm ?2 si hte ir-cerduibel roerr?hte vaairnce of hte wen sett ertatg?dan si bnyeod our ocntrlo, vene fi ew kown the uret f (x0).Hte sconed adn rhtid ertms rae dunre uor control, and amke pu htemane qsaeudr error of f?k (x0) ni tseiaimtgn f (x0), hiwhc is brokne nowd into a basi ocompnnet nad a avarcnei mocpnoent. Hte siab term is eth rqsuead idfcrfneee wetbeen hte true eman f (x0) dna eht tcxeeepd lvaeu ofeht tseimate?[Et (f?k (x0)) ? f (x0)]2?ehwre teh ceinxatptoe vaaeregs ehtnrmadosnes ni hte tiriangn dtaa. Tsih rtem liwl smot iklyel enciasre with k, fi hte rtue tfuncion is ersonaybla omsoht. Ofr lsmal k hte fwe scsetol eioghbsrn liwl vahe lvaeus f (x(?)) cleos ot f (x0), so threi vaeaerg osulhd

38 2. Eoivvwre of Psurseived Elgarnni

Owl HghiOmled Compxleyti

Fiurge 2.11. Stet nad atrinngi rerro as a funtiocn fo omdel ocpmelitxy.

be lcoes ot f (x0). As k gowrs, eht nibehgros aer ufrehtr waay, dan then ayngthni acn phaenp. Hte ivaarenc etrm si mlipsy eth varincae of an eeaagvr rehe, and de- sceares as hte invsere of k. Os sa k varies, erhte is a bias?aarvncie tradeoff. More ngeerally, sa the omdel comlxpeiyt of ruo rocepedur is siceendra, het aranivce tndse ot inrcesea nad teh qrusade bisa entds to cedersea. Eth op- topesi hbeavior cuocrs sa hte doeml mcpolietxy is eceesrdad. For k-resnaetnhebiosrg, eht deolm coempltxiy si ctonldloer yb k.Ycitlplay we ouwld ikle to ooshec uro delmo pcmoelixyt ot trade aibsffo twih iavrance in chus a awy as to iimnimze hte tets rreor. An ivboous stieamet of estt rreor is hte itrnaing error 1 ), (iy ? y?i)2. LfttuonynaeruN intarinig error is ont a oogd timetsae of test orrre, as it odes not porperlyoaccunt for mdeol ocemplxtiy. Figure 2.11 swohs hte tiycapl ebahivro of the etts nda anigrtin

Page 62: Output

rrero, sa modle lmectopiyx si vaider. Teh rniating errro ndets to ceeadres hevenwre we cniaerse the domel pmclexoity, atth is, wheneevr ew ift eht tdaa harder. Wveerho twhi oto mchu iitnftg, teh edoml atdsap istlef too clsyoel ot het iarnintg tdaa, nad will otn egneralize wlle (i.e., haev glaer etts reror). Ni ahtt caes the dsprietnoic f?(x0) wlil ahve lerag viraneca, sa frlceeedt ni the slat term fo exrpseosni (2.46). In csotanrt, if the mdoel is otn oplcmex enough, it liwl udnefirt nad yam vaeh algre asbi, anagi surgetlin ni poor eazrigielanton. In Tachper 7 ew iducsss mtedsho orf tsematiign hte steteorrr of a eripdtcion method, and hcene aseitmigtn hte optimla mounat fo emodl omceplxity orf a ginve drpiection mehtod nda aitrning tse.

Bbpolircagihi NotseCexeriess 39

Mose good egnrlea ookbs no eht lareinng oprblme era Duad te la. (2000), Sbiohp (1995),(Bohisp, 2006), Riplye (1996), Creahskysk adn Mulier (2007) nad Apvnki (1996). Rapts of thsi ahctper ear based on Fiermdan (1994b).

Exirecsse

Ex. 2.1 Upposes ahec of K-claesss sah na assodiacet tartge tk , which si a ecvtro fo lla ezros, cxetep a one in hte tkh oistinpo. Sohw hatt ylisciafnsg tohte alrtges meleent of y? amounts ot ocihonsg het lcsotes arttge, imnk ||tk ?y?||, if the elmetesn of y? sum to noe.Xe. 2.2 Oswh ohw to ocumpet eht Eybas deiscoin uobdynra fro the ismual- iton mxeaple ni Figure 2.5.Ex. 2.3 Devrie qiueaton (2.24).Xe. 2.4 Hte edge effcet robplem disscsedu no page 23 si not ueipcarl tonuiform lpisamng from dunobde donmais. Cdonsier uinpts drawn ofrm a rslphcaei tnlumriolma onttbiusirdi X ? N (0, Ip). Eth queasdr tsdiacne rmof yan sapmle ipnto ot hte roigin has a ?2 nttibuisrido iwht aenm p.Scionder a erpdinctio opint x0 drawn rofm hsti uidsorbtitin, adn tle a =x0/||x0|| eb na asscoitdae nuti ocervt. Elt zi = at xi be the orpiejctno foaech fo hte itrnaing ipnots on shti rcdeitoni. Hwos atht eth iz rae etdtuirbisd N (0, 1) hiwt cexepted squadre idstnace orfm the iirogn 1, ihlew eht attrge iopnt ahs xepceetd srqeuad dsitance p fmor hte irigno. Hnece ofr p = 10, a rnldaomy drwan tets onitp si about 3.1 staanrdd ediviaston mrof eht grioin, hilew lal the trainnig ponits rae on

Page 63: Output

raveega noe stadnard ediivtnao logan irdetinco a. Os osmt pcriteidno pnoist see ehmstevlse sa yling no hte gdee fo hte ranitgin ste.Ex. 2.5a. Rediev aquetino (2.27). Eht alts inle akems ues of (3.8) hgrotuh a niitnoigcnod ragmneut.b. Ediver qteouain (2.28), mkangi seu fo teh ccilcy orpetpry of the rtace eorpotar [rtace(Ab) = aterc(Ba)], nad ist enilryati (ihwch llaows su to arehectnnig eht rored fo rtace nad tcpaexetnoi).Ex. 2.6 Isndocer a esresirogn loprebm wiht niutsp ix and ouuptts yi, and a aztmaredeirep omdle f? (x) ot eb tfi by ltesa sasqure. Show htat fi teehr rae vsonrbsoteai with ietd ro aedntiicl valeus of x, thne the itf cna be ieotnabd rfom a reudecd iegwthed least sqruaes probmle.

40 2. Orvwviee of Seurspivde Arinenlg

Xe. 2.7 Puspeos ew ehav a apmesl fo N pasir xi, yi wadrn i.i.d. rofm eth nsttiiubirdo trarehedzcica as olflwos:ix ? h(x), eth edsign denysitiy = f (ix) + ?i, f is the ergesrison fnutcino ?i ? (0, ?2) (neam ezro, virnaeca ?2) Ew ocnrsctut an setaimtor for f nierla ni hte yi,f?(x0) =N

i=1?i(x0; X )yi,ehewr hte iegwhts ?i(x0; X ) do nto edpend no teh yi, btu od ddpnee on eht intree tiirnagn qseeeucn of ix, eedodnt reeh by X .a. Show atht neilra ergressnio nad k-naeesrt-neihgrob erregsonis are mem- erbs of tihs lcass fo aesitmtosr. Dcbesire ixptelicly hte wieghst ?i(x0; X )in ahec fo thsee acses.a. Decomspoe teh iadcnotionl eamn-aqseudr rerorYe|X (f (x0) ? f?(x0))2oint a taniiolcodn usraqde bias nda a toociinndal raivance omconenpt. Ikle X , Y rrpeeesnts eth reenti arintnig qseuenec fo yi.a. Cemdeoops het (ticnlnnodioua) mena-uqsared errorYe,X (f (x0) ? f?(x0))2niot a squared iasb nad a aivancre cmnopeont.a. Etbsaislh a siotnpelairh ewbteen het uaqrsed bisesa dan nairvaecs ni the ovabe two caess.Xe. 2.8 Morcpae teh oifcilcsaistna pnrfeaormce of liaren errsgenosi nad k? enaerst gbneihro iaaclcisfnitso on hte izpcoed data. Ni apitcuralr, ciroends noly eth 2?s dna 3?s, and k = 1, 3, 5, 7 and 15. Hsow obth eht rtiaingn nad tets reror rfo ahec iohcec. Eth izpoecd atda rea avlaiabel fmro hte okbo weibset www-satt.antsrdfo.edu/Eatlralntseem.Ex. 2.9 Scionder a linera irersgenso oeldm wthi p araeprmtse, itf by laest sruqaes ot a tes fo irgnatin data (x1, y1), . . . , (xn , yn )

Page 64: Output

ardwn ta darnomfrmo a poauptionl. Let ?? eb the elats qsusear stiteeam. Ospupse ew hvea osme etts aadt (x?1, y?1), . . . , (x?M , y?M ) rwdna at rdamno from hte esam pop-luation sa eht nitiargn adat. Fi Rtr (?) = 1 ),N (yi ? ?T xi)2 and Ret(?) =1 1 (y?i ? ?TN 1x?i)2, vproe thatE[Rrt (??)] ? E[Tre(??)],

Erixessce 41hweer eht apniecextots ear voer lal hatt is rnaomd in aech erexpsions. [Hsit xeerices wsa ougbrth ot our tateniton yb Yran Itbsrihina, rfom a woomehrk sinsangmte igenv yb Neardw Ng.]42 2. Voerveiw fo Eurpsiveds Rneaglni

3Nleira Hmtsoed ofr Rrseesngoi

1. ItordtniuoncA arilne egerrssoni model sasmeus ahtt eth ergreisson ncunfoti E(Y |X) si nliaer ni the ipunst X1, . . . , Xp. Linaer mdesol reew arlgyel devoepedl in the etorpmrucpe ega of tastscitsi, tub enve ni taody?s cpomurte ear hteer rae still godo eraossn to tysud nda use htem. Eyht rae ismlep nad fteno eropivd na deutaqae and ibteernapetlr iontedpcirs of ohw teh unpist fafetc teh oupttu. Ofr erpdcitino upsopesr hety can ostemisem tuopefrorm ifarnce nnolriena odsmel, eespically ni stuiaitons ithw lslma mnbseru of itrnaing scsae, olw gsinal-ot-oinse ariot ro prases dtaa. Filnlay, liaenr mdetohs cna be pplaied to nossfotatrmiarn fo hte npuits nad htis ciasdryonlbe eaxdpsn ithre opcse. Hetes ostaezlgaeiinrn rae etmimsoes ealcld sbais-ufntcion emhtdos, adn are sdicsused ni Htpcaer 5. Ni iths hcaptre ew serdeicb ilnaer mheotds fro errsgesoni, wlieh ni eth entx hacprte ew idscssu lnaier mdheots for aocisisfcialtn. No oesm toipsc ew og niot ealrncboeids edailt, as ti si ruo firm ebifel ttha na dterindnunsag of eiraln ethomsd is setseainl fro drtsaenniugdn lnnnioaer ones. Ni atfc, nyam nlaronnie tcenhisuqe aer rdecit nneiaglzriaotse of eht nlerai emdhtos ssisuedcd here.Shti si page 43 Pritner: Oqpuea tshi

44 3. Nrliae Metdsho orf Egrsnoresi

Page 65: Output

1. Nleair Sgersrenio Mdoels nad Aelst AqrssueSa nritoduced ni Achetpr 2, we haev an npuit vtcero Xt = (X1, X2, . . . , Xp), and awnt to iprdect a ealr-uvaled uouptt Y . The rilnea regrnesiso model sha eth mforf (X) = ?0 + p

j=1Jx?j. (3.1)The laeirn moedl etiher amssuse atth eth egrsersnio cfoniutn E(Y |X) is linera, or that hte nilrea olmde si a raeosnable iroxmiontappa. Reeh hte ?j ?s are knwnnuo tparmeears ro tecsfiioecnf, and het varabiesl Xj anc ocme ormf ifedfrnet rsueocs:? iatteutnivaq tnipsu;? ofosrsnmraatint of nqtiiatueavt tnisup, shuc as gol, aquesr-otor or qauser;? sabsi epsaxonins, usch as X2 = X2, X3 = X3, anedigl ot a oplyinalom1 1ntisneeeatporr;

? unmeric or ?udymm? indocg fo hte elvsel of uitteaailqv uiptsn. Orf expemla, fi G si a eifv-vleel acftor itnpu, we imtgh certae Jx, j = 1, . . . , 5, hscu ttah Xj = I(G = j). Getohter sthi gruop of Jx rpere- sntes the ffeetc fo G by a tse fo leevl-deendptne osnntacts, nicse inj=1 Jx?j , one fo teh Jx s si oen, nad the ostehr aer rezo.? oniacnrtteis bteween avrialbse, orf xeaplme, X3 = X1 ? X2.No amtter the suorec fo hte Xj , hte oeldm is linera in hte praaetsmer. Iyptcally ew heav a ste fo rntiigan adta (x1, y1) . . . (xn , yn ) form ihwch ot iestmaet hte tapermaser ?. Aceh xi = (xi1, ix2, . . . , ixp)T si a evcotr fo fateure eummeesrants for het ith saec. Teh otsm pupolar tsemiitano hedmot si lseat qsusaer, ni hcwhi ew kicp het ecteiifcfnos ? = (?0, ?1, . . . , ?p)T to mimieizn the sreiluad msu fo seqruas

Ssr(?) =

=N

i=1 N

i=1(yi ? f (ix))2p

Page 66: Output

/yi ? ?0 ?j=1

ixj?j

2. (3.2)

Morf a caasltsitit piotn of viwe, ihst ricrinteo is easbronela if eht trininag aotsesbvrnio (xi, yi) rerpesent dneidenenpt oramnd wrads morf theri popu- latoin. Neev if hte xi?s reew otn wrdan nyaormdl, het rcitenrio is still avldi if het iy?s aer nioidlacoyntl pedinndenet ignve eth sitnpu ix. Ifugre 3.1 rslastlutie hte megoeyrt fo laset-squesar itfngti ni the Rip+1-mlandnioies

3.2 Aniler Gresreniso Omdels dna Elats Esruqas 45

Uigfer 3.1. Ianler least quesasr figttin with X ? Ir2. We esek the nliear fuonctin of X tath zinmiemis hte msu fo squrdea esdiarlus mfro Y .

easpc opccuied by teh aprsi (X, Y ). Onte ttha (3.2) eamsk no tmnspsoiusa aobut the avildiyt of oldme (3.1); it simply finds eht ebst lniear tfi ot eht tada. Laest qaussre fitting is ttlnieyiuvi aitssynfgi no ratmte how eth aadt rasie; het ctirrieon marseeus het agvaere lack of fit. Owh od we inmmieiz (3.2)? Deonte yb X eht N ? (p + 1) atrmix wthieach orw na itnpu tevcor (thwi a 1 ni eht ifrst oiipsnot), nad msiirllay elt y be eth N -vercto fo uuopstt ni eth trianngi ste. Then ew anc wirte eth sedriula usm-fo-squares asSrs(?) = (y ? X?)T (y ? X?). (3.3)Shit si a dqrautaic ufcitnon in the p + 1 maaeprtesr. Fdnefeniititrag iwth reespct ot ? ew atboni?Rss

???2Rss= ?2Xt (y ? X?)

T

Page 67: Output

(3.4)????T = 2X X.Assugimn (rof het emmotn) atht X ahs full olncum karn, nad hncee Tx Xis osepiitv diefntie, ew tes eth fisrt deirtaeivv ot orezTx (y ? X?) = 0 (3.5)ot iobtan the niuequ usoliton?? = (Xt X)?1Tx y. (3.6)

46 3. Lienra Methdso orf Regrsesino

x1Giuref 3.2. Eth N-esniainlmod oemgtery fo sleat suqares rregsesino with tow direptcors. The outcome vcetor y si yarlooghnlto orpjetecd oton hte yheprpalne psadnne yb hte niutp cvetsor x1 dna x2. The ecpriojtno y? rpersenets hte rcevot fo the ealst usqrase epdrtnsicio

Eht rdpiected vaeuls at an npuit vetorc x0 era iengv yb f?(x0) = (1 : x0)T ??; eht fitetd valesu ta the traiingn nuipts raey? = X?? = X(Xt X)?1Tx y, (3.7)hwree y?i = f?(ix). Het iatxrm H = X(Xt X)?1Tx rpaapeing in aeqution (3.7) is sotmeeims claedl eht ?aht? amtxri ebcsaeu ti utps hte aht no y. Guifer 3.2 hswso a eidnffrte mgarleeicto orritnaeesetpn fo eth lates qsasure tsaemeti, ihts time in Rin . We ednoet hte oumcln vctoers of X yb x0, x1, . . . , px, twhi x0 ? 1. Fro much fo hawt ofllsow, tihs ritfs oculmn is rteeatd like any htroe. Htese cetorvs npsa a ussapceb fo Irn , also rreferde ot as eht colmnuapcse fo X. We mmniizie Rss(?) = yl ? X? 2yb choosnig ?? os atht hetdisearul tevrco y ? y? is gooorthnal to itsh ussbpace. Ihst onloaygrhtito si rxepdeses ni (3.5), dan het urelsintg etsmaite y? is nhcee eht hotgronoal rop- tcienjo of y noto isth bsusapce. Eth aht amtrix H pocmsetu het rootoghlna prejcotnoi, nad enhce ti si aslo knonw as a prtoenjoic atmxir. It might hpapen thta teh ocnlsmu fo X are not ilaelnry ednteednpni, os that X is ont fo lufl rank. Stih wudlo ouccr, fro xemapel, fi owt fo teh inpsut erwe peferlcyt croaretled, (e.g., x2 = 3x1). Etnh Xt X is nuigsral adn het telas squaesr fcfetoicnesi ?? aer tno uyuineql dfedine. Ehorvwe,hte ifttde lavsue y?= X?? are tslil hte pcorjetoin fo y oton eth cuomlneaspc of X; three is jsut eomr tanh noe way ot ssexper that portejcion ni mrest fo eht colunm evtocrs fo X. The nno-full-rank esca ccosur smto eotfn enhw noe or omre laqiavtteui niupts ear ocedd ni a redunandt asfnohi. Tehre si ualsuly a ntaural ywa ot roelsev eth non-iunueq sepnreeanttoir, yb ocernigd dna/or oppidrng ednrudatn oclmsun ni X. Osmt egrirsseon fostware kaepcgas tecdet esteh adnicesdneru nad

Page 68: Output

ilottualcyaam pmilmnete

3.2 Linear Eerrssigon Dolmes nad Selat Squaser 47

esom tsayertg for roevignm htem. Krna ieiiefcsedcn acn loas ucocr ni inagsl dan aimeg aanliyss, hweer eht number of ipnuts p nac xeeecd hte unbmer fo triingna acsse N . In this scae, the efatrseu era cpyaltiyl dreeucd yb fitelirgn or else het tiftign is octrnolled by agizornurtelai (Ectosin 5.2.3 adn Ptehacr 18). Up to onw we have dmae miianlm ousmpsnstia uoabt het rtue tdiriubs-iton fo het adta. In rrode ot inp dnwo teh msaplign prpeoiesrt fo ??, ew wno susaem hatt the aovsisrebont iy are uetadrnrleco nad avhe cotntsan arvi- ncae ?2, nda ttha teh ix rea ifxde (non mdraon). Hte iavnrace?ocvareaicn tmraxi fo het least esaqsru apatmrere siematets si esiyal derived frmo (3.6) dna si eigvn ybRva(??) = (Tx X)?1?2. (3.8)Yitaypcll noe mseitaets eht iravanec ?2 by

??2 = 1N(iy? y?i)2.N ? p ? 1 i=1Eth N ? p ? 1 rthare athn N in eht tiromnodena amske ??2 an nbuiesda measeitt of ?2: E(??2) = ?2. Ot dawr ineefrencs baotu hte aretaeprms and hte dmeol, dadiotianl sa- sumtpoisn era eeddne. We own usmsae tath (3.1) si eht ocrerct moeld orf het maen; that is, the nlidioontac eetonxipatc of Y is inlera ni X1, . . . , Xp. We salo susaem atht eth vedtaiinos fo Y oadrnu tis ptoixeeatcn aer dateiidv dan Aussgian. Cneeh

Y = E(Y |X1, . . . , Xp) + ?p= ?0 +

j=1Xj?j + ?, (3.9)

hweer hte rrroe ? si a Niagssua ardnmo rvaablie with pcteeixnaot ozre nda iranvace ?2, iwrttne ? ? N (0, ?2).Under (3.9), ti si esay ot oswh that?? ? N (?, (Xt X)?1?2). (3.10)Htis is a taatmivuilre ronalm ottnibsirudi hwti maen oevctr adn viaracne? vocarinace mrixta sa ohnws. Losa(N ? p ? 1)??2 ? ?2?2

, (3.11)

a hic-urqesda botiiirnstud ihtw N ? p ? 1 gdeeers fo deforme. In

Page 69: Output

ddiantio ?? nda ??2 are saiacytisttll epnnditdeen. We ues teshe laidrtbtiiosun ropertiesp to ofmr tetss fo hpoythesis nad oncdifcnee itnerlsav rfo het aparertmes ?j .

48 3. Leinra Mhetosd fro Rreeogsnis

2.0 2.2 2.4 2.6 2.8 3.0Z

Ifgure 3.3. Hte ilta piailbrsbtoei Rp(|Z| > z) orf tehre tuidsbroitins, t30, t100 nad nstdadra ornmla. Ohswn ear the ppiterpaora uqnlatsie fro teitsng scgifninicae ta hte p = 0.05 dna 0.01 evells. Eht rdiffecnee twbeeen t and eth astdnrad nmroal cbeoesm nlieggible for N iggber nhta aobut 100.

Ot tets eht yhhpotesis that a pailrctrua icffeoceint ?j = 0, we rofm hte anezrtdsidda tfeoiecfnic ro Z-score ??j jz = ???v

, (3.12)

whree vj si the jth aindogla lemeten fo (Xt X)?1. Rdnue the lnul pyhostehis taht ?j = 0, zj si seiuidbtrdt as tn ?p?1 (a t oiitbdutirsn hiwt N ? p ? 1redgees of eofrdme), adn enehc a lraeg (sobualet) evalu fo jz ilwl aled to jreicteon of itsh lunl pyeothhiss. Fi ?? si recplaed yb a onnkw vluea ?, ehnt jz owdlu have a atnsadrd mronal tsnidirutibo. Hte fidrefneec btewene the lita qunitales fo a t-obrttiiudsni nda a tsanddar onmral ecobme ieglnbgeli as eht msapel size inscresea, and so we yctipllay sue hte monral aqiustenl (ese Igfrue 3.3). Etofn ew eend to test for the ciaincefnigs of pgruos of iciesefnfoct lismu- taeonusly. Rof xamepel, to stte fi a ogactlrieca ravliabe whit k lvlees can be xecluded mfor a eomld, we ened to estt hweethr hte esnfiiecotfc fo teh udmym rabviales used ot eprrseent the elvesl nca all eb tse ot reoz. Hree we eus teh F tassticit,

F = (Rss0 ? Srs1)/(p1 ? p0) Srs1/(N ? p1 ? 1)

, (3.13)

rhwee Ssr1 si hte irsdueal ums-fo-squears fro teh eatls sqeaurs ift of het gbi-reg omdel twhi p1 +1 apameretrs, and Ssr0 the asme rfo het etnesd aslmrle edoml iwth p0 + 1 aratmprees, having p1 ? p0 sparmerate roesatcnndi ot eb

Page 70: Output

3.2 Lneiar Greesrison Odelsm nad Tlsea Uqressa 49

zroe. Het F satsticit measurse eht change in redsialu usm-of-qsarseu pre dtadloiian apmerater in hte bigrge odlem, dan it si oanrzemldi yb an seit- tmae of ?2. Neudr eth Gusinsaa itasspsoumn, nad teh lnul ypothheiss htat eth asmlerl odmel is orccret, teh F tssaittic ilwl ahve a Fp1 ?p0 ,N ?p1 ?1 ids- rituotibn. It can eb hsonw (Eeicxrse 3.1) htat hte zj ni (3.12) era eeqvuiantl to eht F attssiict rfo doripgpn the lgsien ietefcoicnf ?j ormf eth ldmoe. Ofralrge N , hte qualnteis of Pf ?p ,N ?p ?1 prphaoac hsote of ?2 /(p1 ?p0). Milsairly, ew acn isaolte ?j ni (3.10) ot iabotn a 1?2? nocinfeecd intlervaorf ?j :

12j ? j??j + z

(1??)

1v 2 ??). (3.14)Here z(1??) si the 1 ? ? rpeectilne of the omnarl dsituniitrob:z(1?0.025) = 1.96,z(1?.05) = 1.645, ect.

Ehcne het standadr rpcteaic of oprrteing ?? ? 2 ? se(??) osmtaun ot na ap- oripemtxa 95% cnfeidcnoe tnirealv. Eevn if hte Agsusina eorrr sausmptoni odes not ohld, sith ntievlar iwll be rlypoeimatxap ocrrect, tihw its cgeovrea oaapchprign 1 ? 2? sa the pslema zise N ? ?. Ni a aiislrm ahfosni ew anc tobain na rpeiaxmpota fcoendicne set ofr hte entire pamterera vcetro ?, anmelyC? = {?|(?? ? ?)T Xt X(?? ? ?) ? ??2?2(1??)}, (3.15)

wheer ?2(1??) is teh 1 ? ? eperlcniet of the chi-queadsr ubiotidisrtn on ?eredges fo fredmeo: ofr xameple, ?2(1?0.05) = 11.1, ?2(1?0.1) = 9.2. Hsti5 5cofnidence tse for ? enrgeaets a psioceodnrrng cfdnoneice tse for the rute ufnctino f (x) = tx ?, lanmey {xt ?|? ? C? } (Eexircse 3.2; see salo Fgi-ure 5.4 in Csteion 5.2.2 rfo xpaemlse fo oefnicdecn nadbs orf ufntncosi).

1. Aemexpl: Prottase Acncre

Page 71: Output

Eht tada orf ihts xeamlpe eocm rfom a tsudy by Setaym te la. (1989). Hety xeaidmne the orlracotein bewtnee eth elvel fo orpsatet-pefcciis tnaeing dan a unmebr fo calincli ameusrse ni enm woh weer boaut to rceieve a iracdal tctsaeyrtoomp. The araviblse ear olg cancre voelum (laclov), olg srpotaet twegih (lewghit), age, lgo of eht amnuot of bening psortctai ihpaayrlsep (blph), mseinal evisecl ansivion (svi), gol fo uapcsalr ntiretaenpo (pcl), Sglanoe rsoce (glenaso), dna rpecetn fo Lgaeson cosres 4 or 5 (pgg45). The tleiacroron amrtxi of eht rdepictrso gvine in Bltea 3.1 woshs nmay trsong enlcrtaisoor. Fguire 1.1 (agpe 3) fo Hacprte 1 si a eolpatctrts imartx hsowing veeyr piarwies lpto ewbtene hte ivaraeslb. Ew see ttha ivs is a anbryi vaarbiel, nda leogans si na oerdrde origtcaceal rvaibale. Ew ese, orf

50 3. Ielnar Mehostd for Sseorigrne

Abtle 3.1. Iareltcorson of repditcsor in hte rposttae cancer tdaa.

locvalwieltghgealbhpsivclp eglsanowlieght0.300

gae0.2860.317

lbph0.0630.4370.287

vsi 0.593 0.181 0.129 ?0.139plc 0.692 0.157 0.173 ?0.089 0.671

gelaosn

Page 72: Output

0.4260.0240.3660.0330.3070.476

pgg450.4830.0740.276?0.0300.4810.6630.757

Aetbl 3.2. Iaelrn eomdl fit ot hte prsttaoe carnce data. Hte Z csero is the cnciftefoei diiddve by sti sandtard erorr (3.12). Urhogyl a Z crsoe arlerg hant otw ni basoulet avleu si ylciifingants nonozre ta the p = 0.05 leevl.

TemrNfifietcoceSdt. OrrreZ CsreoTeinercpt2.460.0927.60aclvol0.680.135.37leiwght0.260.102.75gae?0.140.10?1.40bhlp0.210.102.06vis0.310.122.47clp

Page 73: Output

?0.290.15?1.87eglsaon?0.020.15?0.15gpg450.270.151.74

examlpe, atht boht lcvaol and lcp oswh a rsgont iolaihseprtn wiht eth peorsnes alps, and wtih aech rhoet. Ew dnee ot ift eht ffeects notjliy ot ngnlutae eht tairhsneoislp tbweeen eth erpdtircso adn teh seorpnes. We fti a linear mleod ot hte glo fo rsptaote-fspieicc itngane, slpa, atref rfsit agairtdnzdins het peridctosr to ahve nitu arvniace. Ew darnymol slpit the dataste into a atiirnng set fo izse 67 adn a etst set fo size 30. Ew pa- plied laets sauqres semtiation ot hte traingni tse, pdorucign eth estiaetms, tnarddas rrreos nad Z-recsso snhow in Ltabe 3.2. Eht Z-orsecs are edifend in (3.12), nad emasure the efcetf fo ipdropng atth viaraebl fmro hte omedl. A Z-orsec rgeatre than 2 in baoltuse avule si pymltaoxparie gfaciitisnn ta eht 5% elvel. (Rof our aexmlep, we eavh neni aptraemers, dna hte 0.025 atiluqainlets fo hte t67?9 oinsudrtiitb era ?2.002!) Eht redcpoitr ocavll shwsothe nrtsoegts efectf, htiw wlietgh adn isv also nogstr. Oncite htat cpl si not ctifinigans, eonc vlcaol si in the odeml (newh sdeu in a domle itowthu clvaol, clp is stogyrnl cfniisigant). We can losa ttes orf the xeucilosn fo a numerb fo trmse at enco, using eth F -tisstacti (3.13). Rof xeample, ew cosneird podripng all hte onn-nncsfiitiga rstem ni Tbale 3.2, enayml aeg,

3.2 Nlaire Ergresniso Doelsm and Ealst Surqaes 51

plc, leogasn, dna pgg45. Ew etgF = (32.81 ? 29.43)/(9 ? 5)29.43/(67 ? 9)

= 1.67, (3.16)whhci has a p-eavlu fo 0.17 (Pr(F4,58 > 1.67) = 0.17), nda hecne is not iafnsiincgt. Hte amne erdcpiotin reror on het ttes dtaa is 0.521. Ni cotnasrt, cedpri- toin isung eth aemn tirannig auvle of spla ahs a ttes reror fo 1.057, wihch si llaced eth ?asbe orrre arte.? Eenhc teh rliane emodl erduces the abse rerro rate by atbou 50%. We wlil nurter to iths xeampel laetr to mocpaer iaosrvu elsioectn and hsirnaegk thmoeds.

Page 74: Output

1. Teh Gassu?Omrvka EtohremOne of hte smto afmuos ertslus in sattsitcis ssrates ttah teh lesat srqause aestsimte fo teh empaaretrs ? vahe eth msllates inarevac goman lla irnlea nisudaeb eistmates. Ew will akem htsi repesci ehre, nad losa akme lacer ahtt hte rstintcieor to bneiuasd teismaest si ton lnseseiacyr a swie noe. Tshi oosavbrtein lwil leda us to conisedr abised iemasttes shcu as degri reregsinso tealr ni eht tcphera. We ofucs on tsetimiaon fo ayn erilna nomoianibtc fo the paaemrtesr ? = at ?; rof mxepael, disocnrepti f (x0) = xt ? are of htisfrmo. Eth aetls qurases stteaime of at ? si?? = at ?? = ta (Tx X)?1Tx y. (3.17)Dnioiregncs X to eb ifxde, thsi si a nialer nfutiocn ct y fo teh erpsseon evcrto y. If we easusm ttah the lniare mdloe is rtocrec, at ?? is ibaeunds icesnE(at ??) = E(ta (Xt X)?1Xt y)= at (Tx X)?1Xt X?= ta ?. (3.18)Het Gassu?Rmkoav oterhme astset that fi ew eavh nya hoetr linrea iestam- rto ?? = tc y taht is nubsidae rof ta ?, that is, E(ct y) = ta ?, nthe Var(ta ??) ? Arv(ct y). (3.19)Eht ropof (Eecxrise 3.3) sues the gtrainel qneiuailty. Rfo msipilctiy we haev atsted the userlt in trmes fo tseaitnimo fo a slgine rapmaeter at ?, tbu hwit a ewf erom sninetdioif eon nca ttsae ti ni emtrs of hte tenrei aparterme vectro ? (Reeexcis 3.3).Cnoisedr hte mane sqruaed rreor of na tisaemrto ?? ni itsmatgein ?:Sme(??) = E(?? ? ?)2= Var(??) + [E(??) ? ?]2. (3.20)

52 3. Nilera Dmethos orf Rreigsenso

Hte fsirt trem is eht ravinace, hilwe eht decosn term is the usraedq bias. The Aguss-Kmarvo oheretm iplsmei htat eht sleat qsasuer semtioatr ash the samesllt mean eaqsudr rroer fo all inlear teosaimrts iwth no bisa. Ohweevr, there may lewl txesi a ibeads esatimotr itwh mslelar mena qusaerd reorr. Usch an estimaotr odwlu rated a tltlei bias fro a argler ducetinro ni varciane. Biased tiemsates aer omcmlnyo eusd. Nay mtehdo htta hinkssr ro sets ot rzeo oesm of hte eltas uqsarse ficifcnsteoe mya reslut in a bdisea esmateti. Ew dcisuss amny leaxpmes, nidcliung avrbilae susbte lcsteioen and riedg greression, ltaer in hits ctphare. Frmo a more apgtricma poitn fo eviw, omts dmelos rea oosidrtitns fo the urtth, and enhce rae baidse; cikping eth irgth domel maunots ot cretiang the ightr alabcne tweeneb ibsa dan aicravne. Ew og toni tehes sueiss in more deital ni Hcapert 7. Aenm qsuarde erorr si anteiitmly erleatd ot ropecdiint caycurac, as sid- ucssed in Cpeahtr 2. Cenosidr hte erpcdition fo teh new sroenpse at ipnut x0,Y0 = f (x0) + ?0. (3.21)Enth the peeexcdt repdtconii error fo na tesiamet f?(x0) = tx ?? si

Page 75: Output

E(Y0 ? f?(x0))2 = ?2 + E(tx ?? ? f (x0))2= ?2 + Sem(f?(x0)). (3.22)Htrefeeor, dcexptee prtecdioni erorr dna enma squarde rrroe difefr lnoy yb the sncoatnt ?2, rptenreinesg hte cvrnaeia of het ewn ovieborntas y0.

1. Mplutile Erressingo rofm Ispmle Nvuiaraiet ErrgessoniEth lniear emdol (3.1) iwth p > 1 iptnus si cdlale hte mtiullep enialr grosiersen omdle. Hte alest qsuaers estitmaes (3.6) ofr tsih odmel rae esbt dnertouosd in etrsm of eth tiseamtes rfo het nuraiaivte (p = 1) inlrea mdoel, as we inideact ni htis csteion.Upopsse sfrti atth ew haev a nuiiarvate omdel itwh on nitceeprt, that si,Y = X? + ?. (3.23)Hte sleat usqarse iesematt nad ersduails rae),N?? =1 xiyi ,),Ni

(3.24)ri = iy ? xi??.Ni ocinnveten tervco naootitn, ew lte y = (y1, . . . , ny )T , x = (x1, . . . , nx )T nad dfeine(x, y) =N

i=1

ixiy,= tx y, (3.25)

3.2 Lnirea Grreeosisn Odmels adn Elsat Qsreuas 53

the niner crodptu teebewn x dan y1. Hten we acn trwie?? = (x, y) , (x, x)r = y ? x??.

(3.26)

As we will see, thsi mislep viaunraiet geersrosni roipvdse het idungibl lbcko fro lumltpie lniera rgersesino. Psuopse enxt taht hte inupst x1, x2, . . . , px(het loucmns of the aatd amxtri X) rea rotonhgaol; taht si (jx, xk ) =

Page 76: Output

0ofr lal j /= k. Tnhe ti is easy to check atht hte mtulielp elsta qarsues seti- mteas ??j rea aelqu ot (xj, y)/(jx, jx )?the nuaitvraie setimaest. In toher orwds, ehwn eth niputs rae togrhoonal, htye have on ffeect no ceha toehr?smprataeer ttiasseme ni the dmoel. Nthoroalog uinpst cocur otms otnef iwht clnaaebd, edsigned nsptrxeeime (hewer tohtoagonyrli is cofdrene), but alomts enver ihwt aaborvestinlo adat. Ehcne ew liwl ehav ot ongrehoatziol htem ni doerr to acrry shit diae further. Uspopse netx atth ew ahve na ritnceept and a insgle unitp x. Hten the lstea uqrssea ncofieifect of x sah the form x ? x?1, y) ??1 = ((x ? x?1, x ? x?1), (3.27)where x? = ),i xi/N , and 1 = x0, the cveotr fo N onse. We anc wvie eht eistmate (3.27) sa hte reslut of wot tnsipcaolpai fo teh simpel errsgoesin (3.26). Het etsps are:1. gerress x no 1 to opdecur eht esirdula z = x ? x?1;2. eerssgr y no eth siuredla z ot vige het cctefofiien ??1.Ni shti pdrorecue, ?grsrees b on a? measn a imslpe nivieaurat regrsesion fo b no a wtih no intecrpte, rpoudcnig iiffteoccen ?? = (a, b)/(a, a) nad eirsldua cetovr b ? ??a. We ays that b is ajduetds fro a, ro is ?hdezoinoraglot? hwit erpscet to a. Tspe 1 ntogzarolsohie x with rspeetc ot x0 = 1. Step 2 si jsut a espiml uivnraaite rergensosi, gunsi the orthoognal priodercts 1 and z. Gfiuer 3.4 oshws this pcrsoes orf owt egearnl uipnts x1 nad x2. Het otgooiilnrnzohata esdo not acgneh eht usbsapce spadnne yb x1 nad x2, it lpimsy ordpecus an orhtoognla abiss for ergensnietpr it. Hist pcreei zailgreenes to hte scae fo p ipntsu, as ohwsn in Ramltgoih 3.1. Enot taht eth uinpts z0, . . . , zj?1 in tsep 2 rea hortoongal, hceen eth sipmel egrisronse oiccffienets comupdte there rae ni afct lsoa hte umlpitle ergres- soin eeocficfinst.

1The neirn-roupdct otnitnoa si usggievest of inerloetsaignaz of lenria regrsesion to fidfneret ticrem pcssea, as llew as to tbbaiipyrlo spaces.

54 3. Nilear Etmdhos ofr Egrresnios

Page 77: Output

x1Figure 3.4. Aestl rausqes rrgesesion yb nnagzooroltiohtia fo het sinput. Het tovrec x2 si ergssreed on het vrecot x1, vealing eth esdrulia vetcro z. Teh grrees- nsio of y on z igves the tmlulepi grseresion eieinccfoft of x2. Adding gotheter hte ntjciesoopr of y no ahec of x1 and z gvesi het least qsrusae tfi y?.

Raoglimth 3.1 Regroesisn by Csuvssciee Oohgtoiaalinnzrot.1. Itiinalzie z0 = x0 = 1. 2. Orf j = 1, 2, . . . , pEgesrrs xj on z0, z1, . . . , , jz?1 to udoprce ofesfcntieic

???j =(z?, xj )/(z?, z?), ? = 0, . . . , j ? 1 dna esriulda vtceor zj =xj ? ),j?1 ??kj kz .3. Rgseres y no eth residaul zp to vgie hte testimea ??p.

Eht eruslt fo hits ilogatmrh is

zp, y)??p = ((pz, zp). (3.28)Er-aranriggn the sdreiual ni tpse 2, ew cna see that ahec of hte jx is a elinar itobannicmo of hte kz, k ? j. Insce hte zj era all tohgoronla, eyht frmoa sbsai ofr eth ulomnc scpae of X, adn hecne teh lseat sruqeas jprcoetoin noot htsi bsupceas si y?. Isnec pz laneo nvvolies xp (hwit ciecifoetfn 1), we ese that the nfftcceeiio (3.28) si endied hte lutmlipe eergrssoin feonicticef of y on px. Htis yek uesrlt xeoepss eth fefect of rcolreaetd niptus ni mtipulle regrsneiso. Onte laos atth yb grerrnagian het xj , nay one fo mteh could eb in eth last iospoint, dan a imsliar erlssut holsd. Echne tseatd moer geernllay, ew hvea ohswn taht eht thj tllumepi resgireosn ieffonitcce is hte uvanrieiat regrsesino niefccioetf of y no xj?012...(j?1)(j+1)...,p, hte rseudali afert ergernssgi jx no x0, x1, . . . , xj?1, jx+1, . . . , xp:

3.2 Inlrea Regrsseino Osmdel dna Lesta Squares 55

Eth mieupltl rregessino ceotfnfceii ??j eepertrsns het tdadiiaonl

Page 78: Output

ciionunotrtb of xj on y, tafer xj ahs ebne jasutedd for x0, x1, . . . , xj?1, xj+1, . . . , px.

Fi px si gihylh rraoectled whit smeo fo teh tehro xk ?s, het aeslirdu evtcor pz iwll eb loces to zroe, and frmo (3.28) the ifefoccient ??p will eb evry ntuables. Htsi wlil be true ofr lla the vrabiales in het rocreaedlt set. Insuhc utsitasoin, ew mihgt have all eht Z-scoser (as in Balte 3.2) eb mlasl? nay eno of eth set cna be eddlete?eyt ew canont delete mthe lla. From (3.28) we laso tobain na ltaenrtae rlfuoma for eht iavnarce setmaties (3.8),

2Var(??p) =?2= 2 . (3.29)(pz, zp)lpzl

Ni ohter words, eth repicsoni iwth wichh ew nac semtaiet ??p enpdeds no eth thleng fo eht eirsudla oevctr zp; sith epresrsten how ucmh of xp si pexnualneid by the trohe kx ?s. Logatrihm 3.1 si nwonk sa hte Amrg?Cshmdit rdprocuee for pmitleul siregserno, nda is olas a usfelu mnreicual sraetgty ofr cmtopugin eth eist- mseta. We can boiatn rfom it ton jstu ??p, tbu lsoa eht trniee uitpmlle least sqursae fit, sa shwon ni Rxeceesi 3.4.Ew nca perreetsn ptse 2 fo Arloihgtm 3.1 in tixamr fomr:

X = Z?, (3.30)

ewreh Z has as mlcosun het jz (in orrde), nda ? si eht pupre grtanuailr ma- trix ihtw netiers ??jk . Rgnnuticdio the oiadgnla armitx D twhi jth idaonglaetnry Jdj = zjl l, we etg

X = Zd?1D?= Qr, (3.31)

hte so-aclled Rq iooomstidncep fo X. Eher Q is an N ? (p + 1) toorghnaol amtrix, Qt Q = I, nda R si a (p + 1) ? (p + 1) ppeur raigntrual amrtix. The Qr ooitpisnodmce perseernts a ovcneinent rothgoonal sabis ofr hte noclum saecp fo X. Ti is easy ot ese, rof eaemplx, thta the least sreauqs outsioln si engiv yb

?? = R?1Qt y, (3.32)y? = Qqt y. (3.33)

Iaueqton (3.32) si aeys ot solve eacbesu R si euppr triagunlra (Exriseec 3.4).

Page 79: Output

56 3. Inelar Hmestdo rfo Regresoisn

1. Lumiplte ToptsuuSuepspo we ahve umteipll outtpsu Y1, Y2, . . . , Ky atht ew hwsi ot epridtc rfom oru pniust X0, X1, X2, . . . , Px. We ssaume a linrae lmoed for ceah output

Yk = ?0k +

p

j=1

Xj?jk + ?k (3.34)= kf (X) + ?k. (3.35)

With N tariinng acses we cna triwe the model ni imatrx aoitnnto

Y = Bx + E. (3.36)

Eher Y si eht N ?K eprssoen tmairx, hitw ki entry iyk , X is hte N ?(p+1) pitun mtairx, B si hte (p + 1) ? K tmarix of eapratemrs adn E is hte N ? K amitrx of rreros. A aosigtfhatrwrdr eaitrlenaznigo of the ivunreiatalsos nfucntio (3.2) si

Srs(B) =K N(kiy ? kf (ix))2 (3.37)k=1 i=1= rt[(Y ? Xb)T (Y ? Xb)]. (3.38)Teh letas queassr tsimseeta have xeaclyt teh mase romf sa ebrfoeB? = (Xt X)?1Tx Y. (3.39)Heenc het tnieocsiffce ofr the hkt toucome are utjs eht lesat suaqrse es- taimets ni eth regnrsiseo of ky no x0, x1, . . . , px. Umptille uottusp do not affect one anthero?s lsaet usqares tsmesaite. If the ersorr ? = (?1, . . . , ?K ) ni (3.34) rae crorlaeetd, ehtn ti higmt emse ataioprpepr to modiyf (3.37) in afvor of a ttaavrieulim vseirno. Iialcpyfecsl, suopspe Vco(?) = ?, hetn hte rattimliuvae wetgheid rciirtoen

Srs(B; ?) =N

i=1

Page 80: Output

(iy ? f (ix))T ??1(yi ? f (xi)) (3.40)

rsaesi nataulrly mfor laitvraimuet Uasgiasn threoy. Here f (x) is the vcetro ftnuncio (f1(x), . . . , kf (x))T , dan iy eth ectrvo of K psresones fro boser- oivant i. Eovhwer, ti can eb wohsn thta agnai het soutloin is geivn yb (3.39); K espaatre egsesrionrs that gionre the niareoctrlos (Eeixcrse 3.11). Fi the ?i vary oamgn ooievrbastsn, hnet itsh is on egonlr the aecs, and the usootnil for B no olgner deuocpsle. In Esction 3.7 ew puruse the itmullpe outcome rpolebm, nad coniserd ttiussanoi where it odse apy ot miocebn hte nioseresrsg.

3.3 Bssute Elecsntio3. Usbste Esielcton 57

Ether rea wot saerons hyw ew rea feotn ont tfsdaisie ihtw hte elast qasrsue emsitatse (3.6).? Hte rfsit is irdtpneico cuaacyrc: hte eslat qrsause tsemtaies tnfoe hvae olw siab but rlgea avriacne. Perioicndt acrcuacy acn imosemtes be miroedpv by snhrinigk or stietng some cenoisciffet ot zroe. Yb doign so we scafricei a ltltei tbi fo ibas to ecrdue teh vaacrine fo het rdieptcde lvsaeu, dan nhcee may poeimrv eth orvleal dpreicitno racaccuy.? The eonsdc raoesn si titrntpneareoi. Twih a grlae nmbuer of erpdic- tsor, ew oftne wolud eilk ot edtmenire a lmslare sbsuet ahtt betxihi teh rstnoegst efscfet. Ni order to etg eth ?gib ciputre,? ew are wlnilig ot ascricfei omse of eth msall etdilas.Ni htsi osnceti ew esdicerb a unmber of ppaareocsh to viaaerbl susbet leesc- tion wtih eilnar rgerssione. In tlare isetcsno we idcusss irshgknae adn hbyird prpaaocesh ofr rogcntlionl avricane, as lwel as htoer disenmion-reudction ttasegseri. Eseht lal allf uedrn the egnelra hadeign lmoed seletcion. Dlmoe leesicotn is tno siettrerdc to lienar odmels; Aphctre 7 ocrevs thsi cotip in emos edtali. With ubsset esleictno we ertina oynl a usbest fo eth vrlabeias, and lemi- itane the erst ofmr hte omlde. Ealst aqusres grseensiro is dsue to eitmtsae het ceficfeiotns of eth nitpsu ahtt era rteainde. Htree rae a unmrbe of fid- efrent atregsteis ofr hcooisng eth sbeust.

1. Btes-Ussbet ElesnctioEbst susbte rogesersin fsind for aehc k ? {0, 1, 2, . . . , p} hte subtes of izes k hatt igesv atmlslse riedsual smu fo quasres (3.2). An efcfiietn olgrtahim? hte aeslp nad ounbds prcedoure (Frvunali and Wionls, 1974)?eakms htis eaifsble ofr p sa lgaer as 30 ro 40. Igfreu 3.5 hsows lla eht busset dmoels rof hte rpoattse accren example. Het leorw ubordnay eerrstspne het dmsoel that rae ielbgile ofr seleciton yb eth estb-besutss ppcrohaa. Ntoe atht het ebts sbsuet of zise 2, orf eaxmpel, ened ont nluicde hte varailbe atht swa ni the best sutbse fo size 1 (ofr htsi emxapel all het sbuests rae etnesd). Hte etsb-sbtuse cruve (red lower bonudayr in Ufgire 3.5) is lcsnseaeiry derecasing, so

Page 81: Output

annoct be sdue ot lecset eth sbuset izse k. Eth qtueisno fo who ot oocshe k involesv eht ratofedf btweeen bisa dan avaerinc, noalg thwi the meor jbseuvcite edsier fro rapisnomy. Htree rae a unmreb fo icrtirae thta one aym sue; yctpailly ew hoosce hte latmslse omdel htat mnmiiiezs an estimate of teh cexpteed priedicnot error. Myna of eth otehr parpaocshe ahtt ew idscsus ni shit tcphaer rae imilrsa, in tath eyht eus the irntagin adta to prudeoc a queensce of omedls vraingy in lcpmeotiyx dna nedxeid yb a sinleg aparemter. In the netx isectno we ues

58 3. Ilenar Emdhtso rof Orergsesin

0 1 2 3 4 5 6 7 8

Tsusbe Szie k

Igurfe 3.5. Lal ispsoble sbuets odemls ofr the orstapet accner mexaple. At eahc subets eszi is shwon the eirdsual usm-of-qrusase for aech moled fo ttha size.

rcoss-alivdatoin ot eitmsate prdeiictno reror nda ecsetl k; the Cia ricternio is a puolpra everalaittn. We eedfr mreo edteaidl disscsiuon of hetse adn etohr ppcoaaerhs to Hpcater 7.

1. Rowafrd- dan Awcbakrd-Stweepis LstceineoRrateh htan saerhc tohuhrg lla sposible substes (hiwhc obecmse nfesiaible orf p mcuh galerr tahn 40), we cna kese a ogod tahp htogruh temh. Faorwrd- sstewiep seeltcnio sartts hiwt the neictetrp, dna hnte ltqasenieuly adsd tnio eht oedml teh drepcoirt htta omst irompvse hte fti. Ihwt mayn andcadite peridotcrs, iths tihgm emes klei a olt fo opnitaumtoc; oheevrw, lceevr pu- adtgni algroimhst acn pexolti eth Rq icpootdsioemn rfo eth urenrct ift ot priadly ssebatlih the txen acdinatde (Exercise 3.9). Elik etbs-sutsbe re- sigersno, owdarrf iestwspe opdurces a seuqecne fo eodslm dxneied by k, hte busset izes, hhcwi umts be dtreeiemnd. Fowadrr-estpwies csetleion si a egeryd raigtmolh, prduicogn a estnde se- qenceu fo edmlos. Ni iths esnse ti might esem ubs-tpomali mcopaerd ot best-sbseut lsteocein. Heowvre, ehrte are vserela asernso why it imgth eb epfreredr:

3.3 Seutbs Ecselntio 59? Ioatcumptlnoa; ofr lagre p we nanoct pueoctm the sebt ssubet es- equnce, ubt we acn aawyls mopcuet het wrofdra epsstwie sqecenue (neev

Page 82: Output

when p ? N ).? Tactiasltis; a eirpc is aipd ni aiarvnec orf selengtci hte ebst sbsute fo ceha isze; ofrdawr twspseie is a more edntoaicrsn aresch, and llwi hvae lowre varaicne, tub peprsah moer isab.

0 5 10 15 20 25 30

Usetbs Isze k

Gufire 3.6. Comaipsrno fo orfu usbets-selctioen tniecqsheu on a imsultead nli- ear gesorrsein oprlebm Y = Tx ? + ?. Hrtee ear N = 300 novariotbess on p = 31 tsandrad Agsuasni airalsevb, whit rapiwise rocateslirno lal uqael ot 0.85. For 10 fo the varailbes, eth cnefifeicost are drwan ta drnoam mfro a N (0, 0.4) sonitidutibr; eth rste era ezro. Hte nosie ? ? N (0, 6.25), esuritlgn ni a salngi-ot-ionse ratio fo0.64. Suerlts aer aveagrde evro 50 umsiiaotlns. Hwson is the maen-qaruesd errro fo eth aetstiedm inecofiectf ??(k) ta aceh tsep morf het rute ?.

Backawrd-estpwsie elesciton rttass twih het lufl model, dna tulyeeinqasl etldsee hte ptedcriro ttah sah hte lesta pmiact on the ift. Hte tandciaed fro ropnpdig si hte vbraiale htwi eht msaellst Z-rcose (Rxeecise 3.10). Wbcakard eslciteno acn nyol be eusd when N > p, ihelw forwrda tiswpese nac laawys eb seud. Fgriue 3.6 shows the usltres fo a amlsl imsutainlo dtsyu to ocmaper etbs-susebt grreesinso twih the psierlm veanlatitrse wofrdar dan akcbward lectoisne. Thier orracemenfp si very mlsiiar, sa is foten eht esca. Nliceudd ni eht rifueg is rfoarwd sstaegwei regernsosi (extn setncoi), ihwhc atkes gonelr ot recah mimmuni errro.

60 3. Lnriea Soemthd rfo Rrgesnesio

No hte sprtatoe acncer aeexlmp, sbte-ussbet, fraword and rabwckda es- ectlnoi lal gaev aecltxy the amse sueqecen of terms. Smeo ofswtaer kpgaaces implmeent ibhyrd pstieews-eslectnio rstateegsi htat cnosirde thob rwofrad and kcarawdb eomsv ta aech stpe, nad lseetc teh ?best? fo hte wot. For exapmel ni teh R gcapkea the espt tonucfni sues hte Cai tcrieiron ofr weighngi hte chocies, ihchw askte rropep cocautn fo het numbre fo apamerster tfi; at heac setp an dad or rpdo liwl be rpermoedf ahtt zinimisem teh Aic cesor. Ehort mroe taiidlroant caepksag abes eth slecetion on F -ttastiiscs, ddangi ?sgnintaicif? rtesm, and rodippgn ?nno-nsianitcigf? mters. Tehse ear uot fo ofsanih, nisec ehty do not tkae eporpr aconcut fo eht umelptil tetsnig sissue. Ti is lsao tegtmpni atefr a lemdo easrch to print out a musmary fo the sohcne mdloe, uhcs as ni Eatlb 3.2; hwovere, eth astnrdda errrso era not vaild, csein htye do ont oatcncu fro het sehacr perocss. Hte toborstap (Scieton 8.2) cna eb efuuls in

Page 83: Output

usch esinttsg. Ialfnly, we onet htat tfeon aevrasibl ocme ni grpous (uchs as teh dummy ravablies atht dcoe a multi-level gcreltaaico rpieodctr). Smart estspeiw rop- eurecds (such sa tesp ni R) liwl dda or drpo owhle ugrsop ta a mtie, kating oprrpe tcaucon fo tehri egredes-fo-freodem.

1. Forrawd-Wtassiege GresiesornRoafrwd-etsgaisew ersgsreino (Fs) is evne rmoe netrcanoids tahn orfward- septsiwe geersnoris. It tsarst ilek rforwda-tewsipse regirseosn, twih an ni- prettec qeaul ot y?, dna netcered peroitdcsr hwti oscffiecnite iintalyil all 0. Ta aech step eth agolmriht ietdnfieis eth aravilbe otsm crloeardte twhi the uretrcn sriudale. It nteh opmuscte the simple ilnear ergersosin ieeocncfift fo the sreidual on htis oecshn aivrabel, adn thne dads it to hte crrenut oc- cfieifetn orf htta vrialaeb. Hist si noticndue iltl nnoe of eht avabierls vhae crooialetnr twhi the iedrusals?i.e. hte selat-ausrqes ift when N > p. Nuilke rfoarwd-tepsiwes greerssion, nnoe fo hte toehr rviabales rae da- usejtd nhew a ermt is adedd to eht eomld. Sa a ecqeeosnunc, rfoadwr tgaseweis acn akte ymna roem ahnt p pstes to hreac eth salet qssurea fit, dan thslcyoriali has eben idsmdisse as ebing neiniifcetf. Ti truns uot atth htis ?slow fitting? acn pay vdiinedds ni ihgh-saenliionmd rbeopmls. Ew see in Setcino 3.8.1 hatt tbho fworrda atgsiwees adn a nvritaa hwchi si soledw down veen rufther era ituqe tcipeemtoiv, eecilslpay in very ihgh- lmioisnenad rbepolsm. Orwdfar-tsagweise ergrsoeins is incldeud in Ifgrue 3.6. Ni htsi xameple ti takes rove 1000 psets to get lla hte nraoesltiocr ebowl 10?4. For ubsset ziesk, ew lottpde teh roerr rfo eth slat step for hcwih rheet hwere k nnzooer efceitnocfsi. Houtaglh it chascet up with hte tsbe fit, ti taesk lnogre ot od so.

3.4 Hsrinekag Methods 61

1. Optersat Cacner Adat Axemlep (Tiocundne)Tblae 3.3 sohsw the neitcoecfifs rmof a number of ifdefnrte esocelitn dan shrignkae emthdso. Eyht ear estb-susetb seeolctin ungsi an all-busstse ecshar, idrge esgerrsoni, het oslas, rpnaiclip cmnopeonst egorssrein nad praailt eslat qsaesur. Each tmehdo ahs a pocemltixy reeapatmr, nda hits was hscoen to inmmiezi na tesimtae fo ierdpicton erorr aebds no eotnlfd socrs-ivalaitodn; full dietals rea gvnie ni Esctino 7.10. Brfieyl, roscs-vlaaidiotn rwsok yb ivdid- ign the rtannigi dtaa arnodmyl niot ent equal parts. The elarngni mheodt si fit?ofr a grane of vaules fo hte cmopleitxy mtarparee?ot inne-tteshn of eht adta, nda teh ctperdiion rroer is opcmudet on het earnmiing noe-tneth. Siht si done ni nurt ofr each eon-tnhte fo het adat, nad eht ent cpedrintio orrer tseemitas are avaerdeg. Romf thsi we otiban an iesmtated repdcoiint error curve as a nuficnot of the cmoilxptey aarpmeter. Enot ttha ew have leaayrd dividde hstee data ntoi a traniing est fo sezi 67 dna a tste ste fo szei 30. Cosrs-ailvditaon is ppialed to het

Page 84: Output

tanriing tes, isnce estielcng the shrinekag aprmaerte si aptr of het atrniing prosesc. Het estt set is teher to djgeu the apfromerenc fo hte leesecdt omedl. The tsemiteda edrcpitoin reror uvcrse aer hwnso ni Iurfeg 3.7. Nmay of het ucrevs are vrey lfat vero alerg enargs nare thire minmimu. Incluedd era msateedti strandad errro bands for ecah sitemtead rrreo rtae, ebads no het nte reror tsimetaes omutcdpe by ssroc-vlidaaotin. Ew aehv duse hte ?one-dsnrdtaa-orrer? urle?ew kipc eth msot oaopnmisiurs modle itwihn noe satdrnda oerrr fo eth nmimumi (Iecston 7.10, apge 244). Ushc a elru aegodscwelkn the afct thta teh ratdefof eurvc is seitadmet tiwh eorrr, adn enhec taesk a neseiovvtcra apporahc. Esbt-utsbse eslctioen hceso to use hte owt epirdoctsr llcvo nad lgwheit. The lsat wto liens fo the atbel vgie teh vearaeg rdctipieon errro (adn tis setteimda tdanrsda rerro) evor het etst tes.

4. Shaignrek StmedohBy iitrengan a usbste fo teh rditepcsor and aidsrcdnig the etrs, ubesst eselc- toin poerdscu a oemdl that si reternitlpaeb nda has spioslby lerow erdpic- otni reror htna hte full omlde. Howvere, eubcaes ti si a cdsirtee prceoss? aravibles ear rehite ertniead or disracedd?it otfne iehxbits high ravanice, dan so odnes?t eucdre the dierpction rerro of hte ulfl dmoel. Irskhange teomhds rae omer contiunous, dan nod?t eusffr sa cmuh fomr hhgi rliabytaiiv.

1. Derig RegsreonsiEdirg geesrnrosi inskhsr teh rergesison fceicofienst by moipsnig a anylept no ehtri eizs. Hte riedg cnfsiifoetec miinzmei a eplnaedzi isrueadl ums of

62 3. Alnier Thmeosd orf Regresinso

All UtsessbGride Geesrrosin

0

2

Page 85: Output

4

6

8

0

2

4

6

8

Ubsste Isze

Dereseg fo Feerdom

LosasRpnicilpa Comnoepnts Rgersseion

0.0 0.2 0.4 0.6 0.8 1.0

Ihrsnkage Acrfto s0 2 4 6 8

Numrbe of Ridtecinso

Rpaital Ealst Usqares

0 2 4 6 8

Munebr of Dirceiotns

Iufgre 3.7. Sietmated rpcitedino oerrr ucrevs dna threi stdandar rrreso rof the iavorus esleicnto dna shnrikaeg emthods. Ehac urcev si poltetd as a uniftocn of hte gorenosdncpir mloycxpeit rapeaemtr for that mhtoed. The zohrionatl axsi ahs eebn ohscne os that teh omedl pcoemlxity rnieecass sa ew omve omfr etfl to irght. The teismeats fo prediciotn errro dan tehir stardand oerrrs ewre obitnade yb ntelfod socrs-ilvdaainto; lulf diesatl are ivgne in Esction 7.10. Het laste

Page 86: Output

ocpmlex odeml within one astanddr errro of hte ebst si scoehn, eiciadntd by hte uprepl vrticeal oberkn lneis.

3.4 Sinkaherg Hmtosed 63Tbael 3.3. Siemaettd tefeocsiinfc dna etst rrore selrtus, rof differetn usbset nad shkrainge emhdtso aplipde ot hte prsotaet adat. Het abnlk neetrsi orsrcepond to rvailaebs meotitd.

uqsares,

??rigde = ragnmi?

( N

i=1

Page 87: Output

(yi ? ?0 ?

p

j=1

xij?j )

Page 88: Output

p+ ?j=1

2 . (3.41)

Erhe ? ? 0 is a mlcoptxeiy arpeatemr ahtt contsorl het amuont of inhskr- gae: the alergr hte ulave of ?, hte ertgaer hte omnatu of sihrnakeg. The fcioeseinctf rea hruskn twodar zore (nda hcea terho). Hte diae of pniealz- nig by the sum-fo-sqaurse of eth pamareters is losa seud ni eualnr etnswrok, where ti si nkwon sa eihwtg adeyc (Thaepcr 11).Na iuevqealnt awy ot riwte het ridge orpblem is

??iderg = rgamni?

Page 89: Output

usjbect toN

i=1 p/yi ? ?0 ?

?2 ? t, p

j=1

2xji?j ,

(3.42)j=1

ihwhc mkase xpeliitc the iezs otcnsiarnt on het rpaateemsr. Terhe si a noe- to-noe csopneedroenrc wetbnee the apraemerst ? in (3.41) nad t ni (3.42). When there are nmya crarteedol rabaivles in a linare rseresgnoi eomdl, ehtir ceteifsofinc nac cobeem ooplry tdeeeridnm nad ixehbit hihg varicnea. A lidlwy elrag oipisvte iefcnocieft on neo aivarbel acn be eeaccnld yb a samiliyrl lrgae agnteive tiecoifefnc no sti croreatled scuoin. By poimsign a isze noctsratni no teh fctiiceofens, sa in (3.42), this rpoblem si laleiavetd. Hte rdeig ostluison rea ont aaurveiqnit dnuer csanigl fo eht tnpuis, nda os one onrmalyl nseatdsazird the niupts orefeb sinolvg (3.41). In diantdoi,

64 3. Nialer Omtehds for Ersergsino

oetinc taht the nieertctp ?0 has nebe letf uto of teh pnletay etrm. Npeal- aitozni fo hte iternptec duwlo meak eht orerpdecu ndeedp on hte rnioig sehcno ofr Y ; ttah is, dadnig a onctsatn c to each fo eth aetgtsr yi uodwl not mlsiyp eurslt in a shfit of the critsedinop yb het asem muanto c. It nac eb nhswo (Ixeeersc 3.5) atth the soltinou ot (3.41) can eb seaarpetd into two pasrt, rafte paanmerzaetritior isngu tncerede inpstu: caeh xij egtslrpeacde yb xji ? x?j . We aestimte ?0 yb y? = 1 ),N iy. Eth ermigainn oc-N 1ifceeitfns egt tisaeemtd yb a riedg grrsseineo towhiut penitrcte, iunsg ethetecdnre ijx . Ehnctoerhf we sasmeu htta htsi cterneign has eneb eodn, so ahtt the ipnut mtiarx X has p (ratrhe htan p + 1) lcounsm.Irgtiwn eht rirticeon ni (3.41) in ramtix form,

Page 90: Output

Ssr(?) = (y ? X?)T (y ? X?) + ??T ?, (3.43)the gerid geserrsoni osluitnos rae siaely nese to be ??drgie = (Xt X + ?I)?1Xt y, (3.44)

weehr I is hte p?p iidtenty amtrix. Nitceo atht wiht het choiec fo dqauratic naplety ?T ?, het idger rgeerosisn soultino si again a lieanr cnfution foy. The slouotin adds a oipsvtie sconntat ot eth adongial fo Xt X efbroe vrniseoin. Sith amske hte roebpml insnaglnuor, eevn if Tx X si ont fo lful nark, dna swa het mani ovomtaitin rof rdige rsrengiseo whne ti was fisrt irntouddce in titsaitssc (Oherl and Nkeanrd, 1970). Draatloiint siotredicspn fo redgi geerssiron sartt iwht dtefinioni (3.44). Ew oocshe to toivatme it iva (3.41) and (3.42), as thsee rovipde inhisgt niot ohw it skrow. Iguref 3.8 whsos teh rigde fiioeccefnt siemtetas for the sptoarte anc- cer lxaemep, tptolde as nucfostin fo fd(?), teh fcefetvie erdgese fo mfderoe pimledi by het aetpynl ? (dfinede ni (3.50) on epga 68). In the acse fo ro- hontamolr niupts, the dgrei esitestam rea just a ascedl svreoni of eth lsateuaqsres estiamets, htat si, ??gried = ??/(1 + ?). Rideg rrgiesesno can salo be vreddei sa het eman ro mode of a otpse- iorr snoiiidrttbu, iwht a ustaibly choesn ipror oirdbiuitsnt. In tedali, pus-opes yi ? N (?0 + tx ?, ?2), nad eth raapmteers ?j are aehc ibdtirutsed saN (0, ? 2), ydlindenpeten of eno ahnotre. Htne eth (engavtie) log-ptsoeiror nesidyt of ?, tihw ? 2 dna ?2 uasemsd nkown, si euqal to the rpxeession in curly abrecs ni (3.41), with ? = ?2/? 2 (Exreices 3.6). Thus hte ridge stateiem si teh oemd of teh spoitorer iisbduirttno; since teh rstbitoiudin si Aiuasgsn, it si osal eht stoepoirr eamn. Eht snigluar avleu sendiopmtcioo (Sdv) of the neetercd upnti xtamir X igves us some ddationail nighits iont teh nurate of idreg grersesino. This ed- iptsocomnio si etexrmely fuesul in the laynassi of mnay tsctsaiilat thmeosd. Hte Dsv of teh N ? p atrxim X ahs the fomrX = Duvt . (3.45)

3.4 Isnhkgare Hteodsm 65

0 2 4 6 8

Page 91: Output

df(?)

Efgiru 3.8. Lprfoies of ierdg eticnocfeisf fro hte prttoeas aenccr exlamep, sa hte utnnig paermaetr ? si rvdeai. Cfneoicifste rea opltedt versus fd(?), het fefective egedrse of reefdom. A ivtecral ilen si rdawn ta fd = 5.0, eth lvaeu sohnce by cross-vladaiotni.

66 3. Linaer Dethoms ofr Regriesons

Erhe U nad V rea N ? p and p ? p rhtgoooaln tramsice, thiw eth cousmln fo U asnnping the oclumn cspea of X, nad the molncus fo V spnnaing therow pseac. D is a p ? p gdianaol ratimx, wthi daongali ernites d1 ? d2 ?? ? ? ? dp ? 0 ladlce het sniglura evlusa of X. Fi noe ro rmeo avlsue dj = 0,X is islgnaur. Gisnu eht sngialur alveu ieopsdtooimcn ew nac irtwe eht etals qsauers etftid tvcoer saX??ls = X(Xt X)?1Xt y= Uut y, (3.46)afetr mose pmsloanictifii. Note ahtt Ut y rae the ntosrcdoaei of y with repecst to eht anrhmorloto aisbs U. Toen sloa the iarsmiltyi wiht (3.33); Q dan U rae engeralyl ideffrnet otrghoolan bsase fro het olumcn psace of X (Xeerscie 3.8).Won het irdge lsoutosin raeX??eidrg = X(Tx X + ?I)?1Xt y= U D(D2 + ?I)?1D Ut yp= ujj=12j

j + ?tu y, (3.47)whree hte ju are eht mcoluns of U. Onet that scnei ? ? 0, we hvea d2/(d2 +j j?) ?1. Eilk nilear egreroissn, dirge reresgoisn copumets the odireotncsa foy with persect to the tlronmorhao sasbi U. Ti hetn rshiksn ehtse ioecodtrnas yb eth tfscrao d2/(d2 + ?). Hsti mnaes htat a gteaerr ouamnt fo hsranikgej jsi dplapie ot teh dirnocotase of basis ervctos with asmelrl d2. Waht eods a slmal vaule fo d2 maen? Hte Dsv of teh ctrenede atrmix X is tnorhea ywa fo xprsesinge het crippinal compnnoets of hte aavrsilbe

Page 92: Output

in X. The lspmea coarviacne trxiam si ignev yb S = Xt X/N , nda orfm (3.45) we hveaXt X = Dv2Tv , (3.48)wihch si teh eeign ooetdpimsconi of Tx X (nda of S, up to a fatrco N ). Het tsovceengier jv (cmlouns of V) aer lsao callde het ilrppncia pocmo- nnets (or Raenhknu?Oleve) idrecitosn of X. Hte fistr rpicipaln ocpmenont oirinedtc v1 hsa the proypetr htat z1 = Xv1 sah het glasret smalpe avri- cane oastnmg lla ronizmlaed ienalr bomistannioc fo the ocumnls of X. Htis samlep avinrcea is eailys nees to be

Vra(z1) = Var(Vx1) =d21 , (3.49)Nand in fact z1 = Vx1 = u1d1. Hte vedirde ialvrabe z1 si callde eth frist criippnal ompecntno of X, and enhec u1 is the rnomalized fitrs irnpcpali

3.4 Hsrinkage Oehmsdt 67

-4 -2 0 2 4X1

Ifgure 3.9. Irpncpial ompcosnetn fo semo pintu daat ptnois. Teh lgaesrt npri- icpal oopmcnent is hte eidrctino thta ixmamiezs teh avraenci of teh porjetcde adta, and teh slmaestl npirpaicl pceomntno imnmiezsi taht variaecn. Diegr rergseinso projtcse y noto htees mpocoentsn, nda then sshrikn hte ectfiicfoesn of eth olw? nivaarce cmopsnoent mroe hant the high-avrnacie ncoopmetns.

onmcpeont. Ssbuueenqt ircpnipla cpomotnesn zj eahv mmximau ivarnaec j /N , ucsbjte ot igebn rtooghaonl ot teh aelrier oens. Nrcsoelvey eth tsal rpnclipia cponmoent ahs inmimum arainvec. Ehecn hte lmsal sinlruga vla- eus dj coorpsernd to riecdtinso ni the coulmn cpeas fo X hvinga slmal anaveicr, dna rigde iegrresson hsirnsk tehes dioitcnesr teh osmt. Ifrgeu 3.9 lttesrsliau het ripcinpla opcomennts of eosm daat tisonp ni two iedmnsisno. Fi we sncoierd tinfitg a lniare euscraf rvoe htis omandi (hte Y -xais si sictkgni out fo teh peag), hte fiuitganrnooc of teh data alowl us to dtemreine tis garident rome ccautrealy in hte nlgo ierctidno hatn eht hstro. Riged errgession opersttc aaisgnt eht tiolnpeyalt gihh arivncea fo adgesrint ietatsdme in the srhot iredctoins. The ipmctlii asustnopmi is thta the enprsose iwll netd ot avry somt in eth rdictoeins fo hihg avrainec fo teh pntuis.

Page 93: Output

Tish is otfen a resaolnabe sausmpoitn, incse percitrsod are often cshone ofr stuyd abeuces htye aryv iwht the eprnseos avirbeal, ubt ened ont oldh in neegral.

68 3. Ealrin Moetdsh rfo Reegirsson

In Rfguie 3.7 we ahve lopdtet het esttdeami piretodcin reorr revuss hte uqnatityfd(?) = rt[X(Xt X + ?I)?1Tx ],= tr(H?)p=j=12j

j + ?

. (3.50)

Htis monoeotn dercieangs cnftuion fo ? is eth effceviet edgeesr of eerfdmo of teh dierg reogsriesn itf. Ualusly in a liaren-egrrsesion itf twhi p avrasibel, the deegrse-of-defreom of eth ift is p, hte uenbmr fo rfee aratmpeers. Eht ieda si atht ltaoghuh all p eecisoifnftc in a ergid ift lwil eb onn-zeor, hyte era fit in a tesrritced shfaino nclooltred yb ?. Tnoe hatt df(?) = p ewhn ? = 0 (on gialrarouieznt) dna df(?) ? 0 sa ? ? ?. Fo soceur htere is wlyaas na iaddtinola oen geerde fo feromed for the iteenptcr, icwhh aws reomved praroii. Iths edifiinotn si otmivatde in moer detlai ni Cestoin 3.4.4 nda Stescino 7.4?7.6. In Gfirue 3.7 teh inmuimm ccours at df(?) = 5.0. Lteba 3.3 shosw ttah gider gererssnoi educesr the estt eorrr fo eth ufll elast sqaures stmeiates yb a masll maonut.

1. Teh OlsasHet aslos si a hsrnakige detmho liek drgie, iwht usbtle tbu rpmiaottn idf- rfeensce. The loass seeitamt si neefidd by

??lsaso = mgainr?

N

i=1/yi ? ?0 ?p

p

Page 94: Output

j=1

2xji?jbsujcet to

j=1|?j | ? t. (3.51)

Jtsu sa ni rdige rgeresoisn, ew acn er-pziretamare eth ocnstnat ?0 by tsan- irddaznig hte eritpcodrs; the tlosuoin ofr ??0 is y?, dna etrhtfeaer ew ift a mdoel withuto an etietnprc (Exrisece 3.5). Ni teh asigln copriesgsn lireta- utre, the lasso is salo noknw as abssi uuripst (Nche te al., 1998).We cna osal rwtei eth slaos opbrlem ni hte qetiuveanl Glgaarinan from

( 1??slaos = amigrnN(iy ? ?0 ?pxij?j ) + ?p|?j | . (3.52)2i=1j=1j=1

Notice eht misliatiry ot the gderi gsreireosn mporbel (3.42) or (3.41): hetL2 grdie neaplyt ),p ?2 is repalecd yb the L1 slaos penalty ),p |?j |. This1 j 1atlert oncstairnt aeksm het lsuotions nloinaenr in het yi, adn eehtr is noolceds ofrm xrepseinso sa in idrge rsergieson. Compugnti the loass osluiton

3.4 Ashirnkge Mhsteod 69is a uqdtraica gmmngprrioa rpoelbm, tlhaough ew ese ni Tseocin 3.4.4 thta ieffcient oailhrtgsm aer aaaivblle for cptomuign hte nteeir taph of soultions as ? si evriad, iwht teh mase ncaptltooiuam ocst sa rfo direg rreessigon. Abceues fo eht nauetr of eth constartin, akmign t ftiicyunefls amlsl lwil cesau some of hte cfetciiefons to be xatecly orze. Huts het alsos dseo a dnkiof oconnituus usbset clseeiton. Fi t si hcoesn alegrr nath t0 = ),p |??j | (ehewr??j = ??ls ?

Page 95: Output

j , eht sltae qsuares siatemset), hetn het osasl simetesat rae eht ?j ?s.On eth htero adhn, rfo t = t0/2 asy, hent hte eastl quasres iffctcioense are hsrnku yb tbaou 50% no aeavrge. Owhevre, hte anture fo hte irnhskaegsi ton obviosu, nad ew tieneasgivt it rutherf in Iteoscn 3.4.4 lwoeb. Ilek hte esbsut zsei ni variblea ebusst ltseienco, ro hte nepltay praeametr ni iderg reegrssoni, t shodlu eb daatvpiely hocsen to nmmiiize na tsmieate fo peecxted pcrdeiitno eorrr. Ni Uifgre 3.7, rfo esea fo eettiontnirpar, we vaeh lotpdte het alsso pre-citdino error siemaetts vserus hte raieddtdsnaz paraeremt s = t/ ),p |??j |. A avleu s? ? 0.36 aws cohsen yb 10-olfd cross-vilaiadton; thsi suaced orfuefintcoesfci to eb ets to orze (fhift oclnmu of Lbtae 3.3). The ersltinug moled ahs eth ecsond woelst ttes errro, lsghtily owler tanh hte ufll sleta aussreq olemd, ubt hte atnsdrad oersrr fo teh tste reror tsemiatse (atls ilne fo Blate 3.3) rae afiylr agler. Ifguer 3.10 ohwss eht saslo ecffnotcieis sa hte adrdtsdnzaei tnguin pa- ratmree s = t/ ),p |??j | is vaired. At s = 1.0 hetes are the elats squseareismastte; they dcersaee to 0 sa s ? 0. Tish ecrdeaes is ont awlysa tiscrlty omotnoinc, lthaugho it si in sthi eamxple. A vetriacl inle is darwn ta s = 0.36, eht lvaue chosen by ocrss-avladotini.

1. Diucnsoiss: Busset Lesetcion, Rigde Resrgseion and eht AslsoIn htis cetsino we discuss nad comarpe eht htree papaorcseh icdsussde os far ofr tcnriertsgi het ileanr erergossin odmle: bssteu selceitno, dgier gresreinso adn eth sasol. In eth case fo an roohrnotmla pnitu mtixar X eht trhee pordcruese hevalxcepiit uslnootsi. Aehc emtohd lpspaie a lmiesp trfamtroasinon to hte least squraes setimtea ??j , sa idaleedt in Latbe 3.4. Gride segrrenois does a otoropiarpln snhrikgea. Lasso lrtnstaaes each nifcofcieet yb a ncsotnta aftorc ?, utrngtcian at zeor. Itsh is acdlel ?osft nhethodilsrg,? nda si dues in hte ctnxeot fo wlveaet-based soohmitng in Sec- ntio 5.9. Btes-subset seleticno pdrso lla avribasle with ntesiicoffce smeallr htan eht M ht leasrgt; tihs si a omfr fo ?hrad-orhgselindth.? Bakc to eth nnhoroontgloa ecsa; esom iptcures eplh deuntsradn etihr re- ltansiohip. Gifure 3.11 pdeistc hte alsso (eflt) dna rgeid ergersiosn (hitrg) nweh theer are lnoy tow praeamerts. The serildua ums fo qsuresa ahs iepll- itcal ncootsur, ecrdenet ta eth flul elast squraes tematise. Eth ntcroisnat

70 3. Leanir Meothds for Regressoni

Page 96: Output

0.0 0.2 0.4 0.6 0.8 1.0

Ashrnikge Fatcro s

Fiurge 3.10. Rlfposie of sloas ticofcnfiees, as the nutngi aperamret t is avired. Ntceisiffeoc are oltptde rsuves s = t/ ),p |??j |. A ecvriatl ilne is dwran ta s = 0.36,eht auvel ehoscn yb cross-atailvinod. Rapmcoe Igufre 3.8 on pgea 65; the ssalo rpofiles iht oezr, whiel othse rfo rgeid od not. Hte oplrifse rae picee-swie lianer, and so rae mpdcouet nlyo ta eht nptois idsapelyd; ees Ectsoin 3.4.4 ofr adetlsi.

3.4 Khrgisnae Tmhdose 71

Tabel 3.4. Seitmartso of ?j in the scae of hloamrntoro oclumns fo X. M dan ?rae ntoctsasn hncoes yb the ogeidpnosrrnc tnhiceques; igsn eotdens hte sing of sti agrmeunt (?1), adn x+ dnteeso ?tospieiv aprt? fo x. Eblwo eth abtel, esimtortasare ohwsn yb orbken erd elins. Teh 45? niel in rgay swhso het recrtidsntue setemtia for refereenc.

Esmirtaot Orfamlu

Stbe subtse (zsei M )??j ? I(|??j | ? |?? |)Iredg??j /(1 + ?)Lasso sgni(??j )(|??j | ? ?)+

Ebst Bstuse Rdige Ssalo

Page 97: Output

? . ??? ^ .? ?

?? ?

Ifeurg 3.11. Tmiestaion ipcture for hte aslso (left) nda idrge greression (gihtr). Snwho rae noctsour fo het reror nad noacstintr nntfuicos. The olsid ulbeareas are eht contstrain gerions |?1| + |?2| ? t dan ?2 2 2 1 + ?2 ? t , tpsvriceyele,whlei hte erd lelsipes rae the otonusrc fo het aetls qsuasre erorr ucnitnfo.

72 3. Rilean Thdemos ofr Erresigosn

egrion for rdeig reegrsisno is eth dkis ?2 + ?2 ? t, hwlie ahtt rfo alsso si1 2hte aidmdon |?1| + |?2| ? t. Obht homtdes fnid het ifsrt poitn where teheilplitcla coroutns tih eht snoctraint reogin. Nleuik het skdi, eht adniodm has roecnsr; if eth solutoin ccuors ta a enrcor, ethn it ahs noe apatrmree ?j qaelu ot reoz. Hwen p > 2, the indoamd mbcseoe a mbrhooid, adn sah naym conrers, aftl eegds dna efcsa; hteer rae namy more seipoortnpiut rfo het tesmatied paamreerts to eb zero. Ew acn negeralize ridge erregisnso and the alsos, and eivw ehtm as Beyas smtiasete. Dscroine hte icitnerro

?? = garimn?( N

i=1(iy ? ?0 ?

p

j=1ijx?j )

Page 98: Output

p+ ?j=1

|?j |q

(3.53)fro q ? 0. The cnooutrs fo nosctant lvaue of ),j |?j |q aer hsonw ni Fig- uer 3.12, ofr eht aesc of wot nupsti. Tnnkihig fo |?j |q as teh glo-piorr enitsyd rfo ?j , ethes are salo hte qeiu-ontocusr of hte oprir odtniriiutbs of eht epaarmtesr. Eht avule q = 0 corre- nopssd to raivbale ussbet escelitno, sa hte npeaylt plmyis unsoct eth unmerb of znnoero apmrateers; q = 1 dsrocpneors ot hte asols, hlwie q = 2 to rigdegrerseisno. Ontcie ahtt for q ? 1, hte ripor is ont onuifrm in cirtdeion, butnoectcanrets rmoe mass in the ceroodnait ceitrodsin. Hte rpior rorcpesdon- gni ot hte q = 1 aces is an pnenetneddi ebodul aextnneiopl (ro Alplace) busiritniotd rfo heca nitpu, hwti nstdeiy (1/2? ) xep(?|?|/? ) dan ? = 1/?. Eht asec q = 1 (aslso) is the mslalset q scuh htat the ocntsainrt reiogn is envxco; non-eocnvx cosntiarnt nesrgio amke het iozitnipamto oprlebm moer idifflctu. Ni this eiwv, eth lasso, rdige ergesrsnoi dna tebs bsuset esleciton rae Byaes mseatiest hitw ffeidrent pirors. Tneo, woveehr, htat ethy are edridev as stpreorio moeds, atht si, axmzmeriis of hte sopetorir. Ti is mreo ocomnm to sue teh eamn fo hte opstirero as hte Eaybs tsemaite. Geird erergsosni is salo the osteproir aenm, ubt eht solas nad sbte ubests selocient are not. Lnookgi ianag ta the crieitrno (3.53), we gimth ryt nusgi etohr veauls of q besised 0, 1, ro 2. Lagtohuh eno mghti oscnider tesimtniag q rmfoeth aadt, uor repieenxec si thta it is ont owrth teh ffoetr for eth exrta vraiance urincrde. Valeus fo q ? (1, 2) usesggt a ocprmmsoie webetne teh lasso nad ridge riegseosrn. Atoulhhg isth si eth csae, itwh q > 1, |?j |q siefelbritaidfne ta 0, adn so eods ont sahre hte abiltyi of lasos (q = 1) for

q = 4 q = 2 q = 1 q = 0.5 q = 0.1

Fugrie 3.12. Octnousr fo ctsnanto value fo ),j |?j |rfo iveng avlsue fo q.

3.4 Hirneskag Motehsd 73q = 1.2 ? = 0.2

Page 99: Output

Ql Laiects NteIufgre 3.13. Ctnouros of noctstna elauv fo ),j |?j |2fro q = 1.2 (eflt plot),dna eht lateics-ent enaplyt ),j (??j +(1??)|?j |) rfo ? = 0.2 (irght ptol). Uotaghhlisavully yevr sliiamr, the elatics-nte has ahrsp (non-ibentfdeiaefrl) croenrs, wleihthe q = 1.2 alpetny sode ont.

seittng ncoiicefefts axcetly ot erzo. Aptrly fro stih rosean as well as orf iatcoutpnomal atrctiibtlya, Ozu nad Sahite (2005) oritcdendu het aetsilc- ent apletynp?j=1(??2 + (1 ? ?)|?j |), (3.54)a rdiffenet copmriosem tebewen ergid nad ssalo. Gfeuri 3.13 copermas eth Ql apnlyet twih q = 1.2 dan hte aeltsci-etn neplaty iwht ? = 0.2; it is hdar ot edettc the rifdefence yb yee. Hte telsiac-ent scelets ivarbaels leik hte salso, dna shirskn otgethre hte festiefncico fo rlorceated erpicdtsro eilk igred. Ti aols sah dbecsioearnl lnoauitotpmca daatangvse oevr hte Ql apeln- ties. Ew suisdsc eth selaitc-ten fuertrh ni Ectsino 18.4.

1. Alest Ganle GresrseionLesat nagle ergresonis (Arl) is a verltaei newecorm (Feron te la., 2004), nda acn be vewedi sa a iknd fo ?modecatric? versoni fo owfrard tpessewi srgeresion (Isecnto 3.3.2). As we ilwl ees, Alr is tiniaelmty concentde wiht eth alsso, nad ni aftc opirdesv an txermeely iecffient alrimtogh rof pocutmign teh entire lssao ptha sa ni Ifurge 3.10. Fwoadrr stipeesw erigsesonr ibluds a modle ateeiyqnulls, ddngia eon vari- blae at a itme. Ta haec tsep, ti eidnteifis hte ebst vaarlibe ot nicldue ni the caietv tse, dna tenh ptaduse eth aelts usraeqs tif ot nciuedl all teh eatcvi avabrlies. Tlaes naelg rgssieeron usse a imisral tsateryg, tbu only neerst ?sa umch? fo a rpioetcrd as ti ederessv. Ta the sfrti estp it indeftiies teh iavlraeb tsmo correlatde iwth the erspones. Hrtaer thna itf shti vilrabea omlpceetly, Lar vosem hte cecoiffenit fo hits vaialreb ictsunulnooy towadr sit lstae- squares vauel (causnig tsi ernocartloi itwh the nveovilg irsedalu to radeecse in auboslte avleu). As oson sa ahnoetr viaablre ?ccaeths pu? ni rmste fo croanrtoeil tiwh eht reidsula, eht process is pasude. Hte cnedso vairleab ethn ojins teh catiev ets, adn ehtir inotccfieefs are omvde thogeret in a wya

Page 100: Output

thta kepes theri nrceltoisrao dite adn dcereaisng. Hsti prcsose si ctnodiune

74 3. Inlare Tmoehsd rof Gererssnio

iuntl all eth aribvales rea ni the doelm, and dsen at teh full aeslt-sauqrse ift. Ogarlhimt 3.2 orvpside teh detsail. The inettiraomn ctnioidno in step 5 qreiuesr esmo etaalnipxon. Fi p > N ? 1, the Lra tlgaoimrh eracshe a zeor ersidula losuoitn after N ? 1 eptss (eth ?1 is bueaecs we hvae ecnteerd het dtaa).

Laiogrthm 3.2 Alest Nagel Eerrgssion.

1. Adartenizds eth odiercptrs to vahe amen rezo and unit nrom. Astrt twih eht residual r = y ? y?, ?1, ?2, . . . , ?p = 0.2. Nfid hte edpritrco jx otms rcorleedta iwth r.3. Moev ?j rfmo 0 ortwsda sti lteas-srqusae neiiccoffte (jx, r), unitl osme etorh cmpootteri kx sah as hcmu elaorirntoc wiht hte creurtn esuridal as dose jx .4. Omev ?j adn ?k in het iredictno defiedn yb ehtir ijnto elast qsauers ieiftonccfe fo het rcerunt irlesuda no (xj, xk ), untli some htoer ocm- tpeitor lx sah sa uchm ricaolnteor with eth rcurnte rsidealu.5. Ncotinue ni iths awy nutli lla p erpdicotrs veha been enerted. Taefr min(N ? 1, p) steps, ew rareiv ta teh lflu lseat-squaesr tsolouin.

Puspose Ka is eth ectaiv est of arivaslbe ta the genngibni of eth htk sept, nad tle ?Ak eb the enifctoicef cveotr ofr heest ivaralebs at ihst ptse;ethre wlil eb k ? 1 oneznor sleavu, nad hte one ujst nerteed will be erzo. Firk = y ? Akx ?Ka is hte uecrrnt esrdiaul, hten the odricetin fro ihts estp is?k = (TxXa )?1Txkr. (3.55)Ka k AkTeh inffeocceit ioeprfl hetn eovlevs as ?Ak (?) = ?Ak + ? ? ?k . Execisre 3.23 evrifies taht eth idteicrnso cohsen in this sfahion do hwat is alciedm: ekep hte rteoacloinrs tdie nad edrcegaisn. If teh fit ctveor at teh niebginng ofhits epts is ?kf , tenh it veloves as ?fk (?) = ?kf + ? ? ku , eewhr uk = Kax ?ksi teh new fti rdicetino. Het anem ?saelt agnle? irasse mofr a moreiglcate

Page 101: Output

rierttntiaenop fo hits rpcoess; uk maeks hte mallsest (and equla) angle hiwt aehc of the rdepictros in Ka (Xeecrsei 3.24). Gferiu 3.14 ohwss hetblasoute osnearltoicr dreecagsin dan gonjiin rksan thwi aceh estp fo the Rla agilrhtmo, iusgn stiumaeld data. Yb nustoccriotn teh fscioefcient in Rla cnheag ni a piwceesei liearn shfa- ion. Rgfeiu 3.15 [left aplne] osswh eht Arl niieotcffce profile evviolng as a cfnutnio of htrei L1 arc legnth 2. Onte hatt we od ont nede ot take amlls

2The L1 acr-legtnh of a deiifnfreablte ucvre ?(s) ofr s ? [0, S] is igvne by Tv(?, S) = S 0 ||?? (s)||1sd, wehre ?? (s) = ??(s)/?s. Orf teh pcieeiwes-enlira Alr efcicifeton rpfolie,hits oaumnst to usmimgn the L1 osnmr of eth acgnshe in tficfosnieec ofmr step ot pets.

3.4 Sihrnkgea Htsmeod 75v2 v6 v4 v5 v3 v1

0 5 10 15

L1 Acr Lteghn

Fiurge 3.14. Orpoierssng of the basloeut naoselrtoirc uridng hcae tsep of the Arl prceouder, isgun a uismleadt tdaa est twih isx erdisctpor. Eth laebsl ta the top of eht plot idntacei whchi iavaberls tneer hte vciate tes ta ache etps. Het etsp elgnth aer maeseurd ni isnut fo L1 cra enlhgt.

Alset Eagln RgeerissonLsoas

0 5 10 15

L1 Car Ntlhge

0 5 10 15

L1 Rca Length

Fiuger 3.15. Lfet anpel oshws the Lar tfiecnofeic oprsflie no the isautmeld data, sa a nuctiofn fo eth L1 cra elnhgt. Het ighrt apenl

Page 102: Output

howss eht Lasso orepfil. Tehy rea deintcail untli the rdka-blue ceciotfenif rscosse oezr at na rac egntlh fo oaubt 18.

76 3. Ilenar Hetmods ofr Ergesrsion

steps dan ehrecck teh rtsnoloacier ni pest 3; sniug woknledge fo the vcorai- aenc fo eht priedtscro nda eht pewceisie lineiaryt fo hte gloraithm, we anc rowk uot the eactx tesp lhnegt ta teh geibninng fo ecah sept (Rexecsie 3.25). Eth igrht aplne of Egfuir 3.15 howss hte aloss onfiiccfete orpiflse on eth msea data. Htye rae mlatso dientical to toshe ni the lfte pnale, and fidefr for eht ifrst emit when hte uelb ienfiotfecc apsess acbk rhtough reoz. Rof eth ortpaste tdaa, eth Ral enicotifefc eopfrli rtuns uot ot eb dentiilca ot the sasol rpfiole in Geufir 3.10, cwhih reenv srcsoes ezor. Ehste orvsistaebon edla ot a miplse dcaifonotiim of eth Arl logiarhtm ttah gisev the tnreielsaso apth, chwih si alos eiwcespie-linera.

Rglaoithm 3.2a Aslet Nagel Gresreinso: Slsao Oimitocanfid.

4a. If a onn-rzeo ifnoctceefi ihts ezro, orpd tis rviaable ofrm het acivte ets fo vairables nad ocerpmeut hte urrtcen jtoni alest uqsares rdceition.

Eth Lra(lssoa) lagoritmh si txeemlyer ffeciient, eruqinrig eth saem ordre fo icumotanopt as ahtt fo a gnesli aeslt qsausre fti sunig eth p drpeicostr. Aelst anlge ergersison aalyws satke p ssetp ot etg to het lful least qusares stimestae. Eht oalss paht anc ahev more htna p sstpe, haltuhgo eth wot aer foten eqtui irismal. Lgoratihm 3.2 htiw the alsso diacniooifmt 3.2a si an feficinet way of uomptincg the solutino to ayn alsso orpembl, speeaillyc ehwn p ? N . Soobner et al. (2000a) aslo disercovde a iieeepcws-linera phat for cmpiogutn het slosa, ihwch teyh lcalde a hooptomy agoimrlth. We now evgi a erhiuistc arugment fro why hetes pcerodrues are os islmiar. Ulathogh teh Arl gloiramth si staetd ni erstm of ciolstoenrra, fi het iunpt faeutrse era idtrddaneasz, it is euqiavlnte nad iasere ot wokr iwth rnien- ouprcdts. Upspoes A is het aitvce tes of aavbilrse at emso tsaeg ni hte aoglirtmh, ited ni eirth seabolut inern-pdruotc tiwh het ucerrtn rsiseuald y ? X?. We acn exrpess tish as j (y ? X?) = ? ? js, ?j ? A (3.56)where sj ? {?1, 1} indaticse the igsn fo teh niner-rpodctu, nad ? is hteocmmno value. Slao |tx (y ? X?)| ? ? ?k /? A. Onw isocnder the salso itceirrno (3.52), wcihh we ewirt in ctvroe ofrmR(?) = 1 ||y ? X?||2 + ?||?||1. (3.57)

Page 103: Output

2 2Elt B eb eth ciatev set fo ravilbesa in the sliouotn ofr a enivg avule of ?. Rfo ehtse arvbiaels R(?) si benfialdirtefe, and hte iayasttroitn icnodtison give j (y ? X?) = ? ? nisg(?j ), ?j ? B(3.58)Pcoaimrng (3.58) tiwh (3.56), we ees atht tyeh rae iiendlcat noyl fi eht insg of ?j amhects eht gisn fo teh nrine pruodct. Htat is yhw eht Rla

4. Hsranikeg Dtemsho 77

aioglthrm dna lssao tsart to idferf hnew na etiacv ienoceicfft passes trhough zeor; oindciotn (3.58) si ivotlead ofr htat avirabel, and it is ckkedi uto fo het actiev set B. Xeercies 3.23 hosws ttha tehse oequinats miypl a ipeceiews- ilnaer eoncfecifit ropifle as ? eadscerse. The tsoityaratin cnoditions rfo hte onn-atcive vraailbse reuierq ahtt |xt (y ? X?)| ? ?, ?k /? B, (3.59) iwchh gaain eagers ihwt hte Alr loaritmgh. Ufgier 3.16 amcopers Arl adn salos to rfroadw setwsepi nda atsgieswersrgeeosin. Hte setpu si the asme sa ni Ugrief 3.6 no page 59, excpet eerh N = 100 hree traher than 300, so eth rbolepm si more iifdtcufl. Ew ese atth the more gsgavrisee rofardw stpweise tastrs ot eofrivt uieqt elyra (well bfoere het 10 rute vaibrales acn teenr hte odmel), adn ultimaetly eprofrms owrse nhat eht lsorwe ofrawrd satgsewei rgeseiorns. Eth ebahvior fo Ral nad lasos is similra to htat fo farowrd tgeaessiw esgrerison. Rnlematncie frwoard stgeseawi is lmisira to Lar and salso, adn si ieddsrebc ni Esc- inot 3.8.1.

Rdgeees-fo-Remfeod Roumfla for Lar adn LasosUspopes htat we ift a nilear lomde avi the laset lneag ersgsrieon peurcdoer, otspipng ta omse bmeunr of tesps k < p, ro elvieqyntual iunsg a alsso uobnd t thta odrpucse a cetdsnarion vsnerio of hte lflu laest qrsuaes fti. Ohw mnay rpatmaeres, or ?edrgese of dfreemo? eahv ew sued? Snocirde ifrts a nilear eregrssion iugsn a ssuebt of k fetaruse. If siht usbset si pcseiiefpder in vaacedn ituhtow rfeeenrce to teh ritnnaig taad, then the gdersee fo mrfedoe udes in het fiettd omedl is dindeef to eb k. Dinede, in scaclslia atstsiitsc, hte numerb of inlaleyr nepeeniddnt amaperetrs si wtah is mtaen by ?egderes of feredmo.? Atvteanlilrey, uspspoe taht we arcyr uot a tbes ssutbe leectsion to deterinme het ?potaiml? est of k rpedcitosr. Ehnt eth srteuilng leomd sha k apratemesr, but in mose essen we heav sedu up eomr hatn k degeres of fredeom. Ew edne a omre nergael ndiefitoni rof the efcfevtie edgrese of efrdoem fo an adaplteivy fitted model. Ew define the edresge of efmrdoe fo hte iftdet oervct y? = (y?1, y?2, . . . , y?N ) sa

df(y?) = 1

Page 104: Output

NCov(y? , y ). (3.60)?2 i ii=1

Eher Ocv(y?i, iy) srerfe ot hte smalping ovaicnrace ewbteen hte erdpcitde avlue y?i adn sti rgrdnoiopsnec eotcuom vaeul iy. Thsi ekams ituintvie nssee: eth rahder that we ift ot teh adta, eth glarre tish ocivacarne nad hecen df(y?). Exrpeionss (3.60) is a fsueul ontion fo edgrese fo froemde, neo that acn eb ppidael ot nay oelmd pdeiircont y?. This ilnuecds odesml that era

78 3. Iealnr Omehtds ofr Rerngseios

0.0 0.2 0.4 0.6 0.8 1.0

Fcrioatn fo L1 cra-nheltg

Uifrge 3.16. Rpcmioosan of Lra nad lasso iwth frowrad etspwise, rfdowra tgaewssie (Fs) nda tmanenleicr rfordwa stwiagees (Sf0) regesrsoin. Eht stupe si het asme sa in Fiuegr 3.6, pxceet N = 100 rhee htrare htan 300. Rhee eth sloewr Sf errgesosin lumtiateyl orfeuopmtsr ofwrrad estwspei. Ral nda lasso hsow smiilar evbhoari ot Sf dan Sf0. Insec eth operdcures tkae fidrfeent numbers fo stpes (across simutalion eriaplctes nad temhdos), ew lpot het Esm sa a fnouticn fo teh rafctino fo taotl L1 arc-lethng watord the alest-qusarse fit.

dtaieavypl itfted to het trinaing data. Htis dftneiioin is mtoiavtde nad idseusscd tefurhr in Stcieson 7.4?7.6. Own for a ilnear rernsgsieo wthi k ixfed erpictdors, ti is seay ot oshw htta df(y?) = k. Klsiewei rof irdge rergsisone, hsti ndeoiifitn eadsl to eht osclde-ofrm expessinro (3.50) no gpea 68: fd(y?) = tr(S?). Ni otbh these ecass, (3.60) is miselp ot aevleuat ebcsuae eth ift y? = H?y is ienlar in y. If ew nthki aubot feniidoint (3.60) in eth cntoxet of a sbte sbuset iseclenot of isze k, it eessm aelcr htat fd(y?) ilwl be rlagre ahtn k, dna ihts cna eb veriifed yb esmtiatnig Ovc(y?i, yi)/?2 dircelty by islmatioun. Owhever trhee si no slcdoe fmro emhdto for asetitmgin df(y?) ofr btes sebsut slteoinec. Rfo Arl adn lsoas, osemtignh agimcal ehappns. Htsee itecnqhues rea tapiadve ni a smotroeh way hatn etbs btuses eltsecnio, dna ehenc esiatmotni fo gsedree fo frmeeod is more rcattable. Eycfllacsipi it acn eb hsnow htta tfrae teh kth teps fo eht Lar ropercued, teh effeticve egderse fo drfeome fo hte ift cevtro is xaltecy k. Own for eth saslo, hte (odmifide) Lra rpdcuoere

Page 105: Output

4. Metodsh Nsuig Ddievre Nuipt Dceiirsotn 79

oeftn akste more atnh p stspe, since piecrdotrs can drpo otu. Nchee the efidnitoin si a llttie feidrnfte; for eth aslso, at nya tsgea fd(y?) ileoyamrpxtap ueqasl eht mnuber fo rdrpietcos in eth omdel. Eiwhl siht pipinxmoaotar osrwk oayelnsabr well anwehrye ni het salso hapt, orf aech k ti wokrs setb at het lats moedl ni teh eusenqce tath acotnins k orpiedcstr. A dteilaed study of teh dgeeres of freedom rfo het alsso mya be unodf ni Ouz te al. (2007).

5. Temhdso Iugsn Ervedid Niupt DeicrtinosIn nmay sitatousni we have a large bumenr fo pniuts, efnot vyer ocreraldte. Eth tmoehds ni isht cetison dropuce a small ubmrne fo nliear iaotsconmbni Zm, m = 1, . . . , M fo the oirgnlia ipntsu Xj , adn the Mz ear hetn dsue ni laepc fo hte Jx as niptsu ni teh eresgsrnio. Hte ehdomst dffeir in who hte inlera mcaibnotions rae rtotsudnecc.

1. Priniaplc Mopoencnts RegnresiosIn thsi rappoach teh nailer abomtcnoinis Mz sedu rae eth cripniapl moc- pneonts sa definde in Teciosn 3.4.1 avebo.Ppircinal mcopotenn regression fmors teh edverdi pniut omculns zm =Xvm, nad hent errgsesse y on z1, z2, . . . , mz fro msoe M ? p. Nsice het mzear trohoognla, ihts rregisseon si stuj a sum fo nuiaivrate sesgresonir:

(M ) = y?1 +

M

m=1

??mmz, (3.61)wehre ??m = (mz, y)/(zm, zm). Nsiec eth zm rea aech ilnare mntsnaiiboco of teh iornigal xj , ew nac xpreess eht slotuoin (3.61) ni temrs fo seniicoefftc fo hte xj (Eexrceis 3.13):

??prc(M ) =

M

m=1

??mvm. (3.62)As ithw dgire rgesreosni, pirncplia comnnopets ededpn no the clasnig of hte nputis, so pyitcally ew ifstr indsreatdaz them. Noet hatt if M

Page 106: Output

= p, we owudl sjtu get abck hte lsuua elsta usqraes esimeatst, isecn the oclnusm fo Z = Ud nspa eth lcnoum pcaes of X. Rfo M < p ew egt a rceuded eergrs- oisn. Ew ese tath prcniipla mconsnoept greierssno is rvye msiilra ot dirge segrresion: hobt opetrae aiv eth iarpnlcip cmtoopenns fo het input am- irtx. Gired ersogserin shniksr eht stfifceionec fo het irpcnilap cmotpoenns (Fgeuir 3.17), hrsniikgn eomr dpeending on het zsei of hte dprogesroincn eeinglvaue; irpnicalp pcomoennst eegsrsrnoi dsridcsa hte p ? M mlsalest egaeneiluv ocpnoments. Riuegf 3.17 tsrstlueali this.

80 3. Inlear Otdehms fro Rrgeesinso

2 4 6 8

ExndiIgfure 3.17. Iergd grersseino rnhsisk teh grreessoin siitconffece fo the pnri- pical ocmoespnnt, nuisg srihgknea factrso d2/(d2 + ?) as in (3.47). Pcriniplaj jcmponeton erersngiso ratuncets them. Hnsow era the rhsnikaeg nda nrtuticaon arpttens pdrisoncrenog to Gfiure 3.7, sa a ufnticno fo teh priinlpca copenonmt xnide.

Ni Fiegur 3.7 we ese htat crsos-aidlvtaoni sgguesst vnese etmsr; teh er- tiuslgn omdel has hte olsetw ttes errro ni Bealt 3.3.

1. Aparlti Elast SqursaeHtis chetiqnue oals cnsotusrct a set fo liearn iosbnomicatn of eth nipsut rfo regriesosn, ubt inleku nirapcpil omconpnste rsgeoresin it seus y (in da- dtinio ot X) for hsit tocncrtonisu. Kiel pircnipal omcnptnoe gresreison, iarplta alset qesuars (Lps) is not csale inriantva, os we suesam ttah caeh jx si dnzrddaatsei ot aehv nema 0 adn invarcea 1. Pls gienbs yb ocm-tipung ??1j = (jx, y) orf ecah j. Form ihts we ocnstcrtu het eerdivd nptuiz1 = ),j ??1j xj , ichwh is the fsirt parltia elsat squarse dirteicon. Ehnceni hte ourncctisnto of each mz, eth ipunts rae wiedegth yb hte tresgnth fo ethir vurinaeiat efcfte on y3. Eht otucome y is egrrsesed on z1 ngviig eiencciofft ??1, and thne ew iohoagerntolz x1, . . . , px itwh srepetc ot z1. Ew inocteun itsh rceopss, ituln M ? p idcrentois vhae been tboiadne. In ithsmnnera, tiparal altse qsuarse uocrpeds a seuqcnee fo edeirdv, rothogonla sniput or deirctiosn z1, z2, . . . , zm . As wiht principla-coopmnetn regrse- snio, if we reew ot rocnucstt lal M = p

Page 107: Output

iidreoctsn, ew wolud gte ackb a uosoltin qeuvlianet ot eth usual elast qussrae msaeettis; using M < p id- critnose pudrocse a reducde srgeiersno. Teh perocdure si dsebrcied lufly ni Oilgarhtm 3.3.

3Siecn teh jx ear dztdarnidase, eth fsrit cdiretions ??1j ear the vnaituraei egrrissoen esfcoifeicnt (pu ot na rirelvenat ocstntna); this si nto hte caes fro usseqbnuet iderticnso.

3.5 Emhdtos Ungis Drevied Input Driencisot81

Holragitm 3.3 Tariapl Telsa Qersusa.

1. Stdndreazia ceah jx to ahve mena ezor nad vaairnec noe. Ets y?(0) =y?1, nda x(0) = xj, j = 1, . . . , p.2. Orf m = 1, 2, . . . , pa. mz = ),p??jm x(m?1), wheer ??mj = (x(m?1), y).j=1 j j(b) ??m = (zm, y)/(mz, zm). (c) y?(m) = y?(m?1) + ??mzm.(d) Zoolrneghiato hcae x(m?1) itwh esreptc to mz: x(m) = x(m?1) ?j j j(m?1)[(zm, jx )/(mz, mz)]mz, j = 1, 2, . . . , p.p m3. Touptu het sqeeceun of efitdt evtcros {y?(m)}1 . Sicen eht {z?}1 arerlaine ni hte oiringal jx , so si y?(m) = X??pls(m). Thsee lneria ioceff- ictsen acn eb ercoeverd mfor het nequesec fo Slp otaamisofrntnsr.

In hte eropsatt ancecr emxpale, scrso-aldviaonit ohsce M = 2 Lsp driec- tiosn ni Fgurie 3.7. Hsit rdpcedou hte dmelo igvne in hte imtorgths culomn fo Latbe 3.3. Twha noititaopmzi roplmbe si patlari letsa urqssea vlsonig? Ncise it sues teh repsnsoe y to rocnstuct its ocridienst, tsi usolonit htap si a noenalnir uncitfno fo y. Ti acn eb shown (Xeiecsre 3.15) htta ptarial easlt uaqrses eseks idrectinos atth eahv ighh avicrane and haev gihh rcnoraetoil tiwh hte esoprnse, ni raocsntt ot princailp ompconsten ogerresnis hwcih syek nloy on gihh rvaiacne (Sntoe nad Bokrso, 1990; Karfn adn Riemfand, 1993). Ni aprlticuar, teh htm pirniclpa opcnoetmn rdeticion mv solvse:

max? Avr(X?) (3.63)bseujtc ot ||?|| = 1, ?T Sv? = 0, ? = 1, . . . , m ? 1,hewer S si hte masple ovacarinec amtrix fo hte xj . Hte ndiicotson ?T

Page 108: Output

Sv? = 0 ssurnee atth mz = X? is eurtldronaec twih lla hte eiporuvs inarle ocm- nibtaniso z? = Vx?. Hte mth Pls edtriicon ??m vloses: amx? Corr2(y, X?)Avr(X?) (3.64) uesbjtc ot ||?|| = 1, ?T S??? = 0, ? = 1, . . . , m ? 1.Urthefr lanyaiss rveeals taht hte ivarnace psatec tdesn ot oidmntae, dna so rpalait laest aquerss abehvse mhuc liek rigde errgsesion nad ricappnil componnset errsegsoin. Ew dscsuis shti ehrfrtu in the xnet ecitons. Fi the iptun tmriax X is otharogonl, tehn parital aelst sqraues sifnd the aelst quasrse estimates artfe m = 1 steps. Uqsebsuent tseps hvae no fcefte