Introduction to Orange - d37djvu3ytnwxt.cloudfront.net · Introduction to Orange ... delimited text...

19
Introduction to Orange

Transcript of Introduction to Orange - d37djvu3ytnwxt.cloudfront.net · Introduction to Orange ... delimited text...

Page 1: Introduction to Orange - d37djvu3ytnwxt.cloudfront.net · Introduction to Orange ... delimited text ... descriptive attributes that are excluded from the analysis (such as a row ID)

IntroductiontoOrange

Page 2: Introduction to Orange - d37djvu3ytnwxt.cloudfront.net · Introduction to Orange ... delimited text ... descriptive attributes that are excluded from the analysis (such as a row ID)

IntroductiontoOrange

• Orangeisadataminingtoolkit,soyoudon’tneedtobeanexpertinanyofthosesubjects• WewilluseOrangeto:• load,manipulate,andsavelargedatasets• visualizetherelationshipsbetweenvariables• discoverandquantifypatternsindata• createrulestopredictoutcomesbasedonobserveddata

Page 3: Introduction to Orange - d37djvu3ytnwxt.cloudfront.net · Introduction to Orange ... delimited text ... descriptive attributes that are excluded from the analysis (such as a row ID)

Orange:GraphicalProgramming

Page 4: Introduction to Orange - d37djvu3ytnwxt.cloudfront.net · Introduction to Orange ... delimited text ... descriptive attributes that are excluded from the analysis (such as a row ID)

UsingtheOrangeinterface

• Toaddawidget,dragitontothecanvasfromthewidgetpanel,orjustclickonitinthewidgetpanel• Toaddasignal,clickonthesignalattachmentpointonawidgetanddragfromittothesignalattachmentpointonanotherwidget• Inputsignalscomeinfromtheleft,outputsignalsgoouttotheright

Page 5: Introduction to Orange - d37djvu3ytnwxt.cloudfront.net · Introduction to Orange ... delimited text ... descriptive attributes that are excluded from the analysis (such as a row ID)

UsingtheOrangeinterface

• Somewidgetshavemultiplepossibleinputandoutputports• Orangetriestoguesswhichoneyoumean• Ifitguesseswrong,doubleclickonthesignaltoselectwhichinputsandoutputsyouareusing

• Youcanalsotemporarilydisconnectordeletesignalsbyright-clickingonthem

Page 6: Introduction to Orange - d37djvu3ytnwxt.cloudfront.net · Introduction to Orange ... delimited text ... descriptive attributes that are excluded from the analysis (such as a row ID)

FileWidget• Loadsdatafromafile• Manydifferentfiletypesaresupported• Recommended:tab-delimitedtext

• iris.tab isanexampledatasetthatcomeswithOrange,andcontains150irisflowersfromthreespecies

Page 7: Introduction to Orange - d37djvu3ytnwxt.cloudfront.net · Introduction to Orange ... delimited text ... descriptive attributes that are excluded from the analysis (such as a row ID)

DataTableWidget• Listsrowsinadataset,sortbyclickingonthecolumnheading• Eachvaluehasabarshowinghowbigitis• Firstcolumnisassumedtobeacategory(inthiscase,species)

Page 8: Introduction to Orange - d37djvu3ytnwxt.cloudfront.net · Introduction to Orange ... delimited text ... descriptive attributes that are excluded from the analysis (such as a row ID)

CCBY-SA3.0

Foreachofthe150flowersinthedataset,thereisavaluefor:

• PetalLength• PetalWidth• SepalLength• SepalWidth

Page 9: Introduction to Orange - d37djvu3ytnwxt.cloudfront.net · Introduction to Orange ... delimited text ... descriptive attributes that are excluded from the analysis (such as a row ID)

SelectRowsWidget• Filtersdataaccordingtosimplerules• Forexample:excludealliriseswithshortpetals• Selectanattributeandaconditionandpress“Add”toaddittothefilter

Page 10: Introduction to Orange - d37djvu3ytnwxt.cloudfront.net · Introduction to Orange ... delimited text ... descriptive attributes that are excluded from the analysis (such as a row ID)

DataSelectionResults• The“petallength”columnnowonlycontainsvalueslongerthan3cm• Thebluecategory,iris-setosa,isnowcompletelyabsent.• Apparentlyalliris-setosa flowershavepetalsshorterthan3cm.

Page 11: Introduction to Orange - d37djvu3ytnwxt.cloudfront.net · Introduction to Orange ... delimited text ... descriptive attributes that are excluded from the analysis (such as a row ID)

SelectColumnsWidget(1)• Choosewhichcolumnsgointhedataset• “Attributes”aredatavaluestobeincludedinoutput• “Class”isthecategoryoftherow• “MetaAttributes”aredescriptiveattributesthatareexcludedfromtheanalysis(suchasarowID)• “AvailableAttributes”areattributesavailabletobeloaded,butignored

Page 12: Introduction to Orange - d37djvu3ytnwxt.cloudfront.net · Introduction to Orange ... delimited text ... descriptive attributes that are excluded from the analysis (such as a row ID)

SelectColumnsWidget(1)• Dragormovevariablesbetweencategorieswiththe“>”and“<“buttons• Eachvariableismarked“C”forcontinuous(numericalvalues)or“D”fordiscrete(categoricalvalues)• Youmayneedtoclick“Apply”beforeanychangesyoumaketakeeffect

Page 13: Introduction to Orange - d37djvu3ytnwxt.cloudfront.net · Introduction to Orange ... delimited text ... descriptive attributes that are excluded from the analysis (such as a row ID)

SelectColumnsinaction• Supposewewereonlyinterestedinsepals,notpetals.

Page 14: Introduction to Orange - d37djvu3ytnwxt.cloudfront.net · Introduction to Orange ... delimited text ... descriptive attributes that are excluded from the analysis (such as a row ID)

FeatureConstructorWidget• Definesnewattributes(i.e.columns)basedonthevaluesofexistingattributes• Typeaformulaandclick“Add”toaddanewfeature

• Selectfieldsusing“(allattributes)”and“(allfunctions)”

• Widgetoutputsthesamedatasetwithnewattributesadded

• Thisparticularcalculationisassumingpetalsaretriangular

Page 15: Introduction to Orange - d37djvu3ytnwxt.cloudfront.net · Introduction to Orange ... delimited text ... descriptive attributes that are excluded from the analysis (such as a row ID)

FeatureConstructionResults• Newattributeisaddedafterexistingattributesbutbeforeclass

Page 16: Introduction to Orange - d37djvu3ytnwxt.cloudfront.net · Introduction to Orange ... delimited text ... descriptive attributes that are excluded from the analysis (such as a row ID)

SaveWidget

• Saveamodifiedfile• Saveswhateverisgoingtoitsinput• Ifyoumadechangeselsewhereinthescheme,theywillnotbesaved

• Becarefulnottoaccidentallyoverwriteyourinputfile

Page 17: Introduction to Orange - d37djvu3ytnwxt.cloudfront.net · Introduction to Orange ... delimited text ... descriptive attributes that are excluded from the analysis (such as a row ID)

Exercise:FirstScheme• Loadandinspecttheimports-85.tab datafile(oncoursewebsite),whichcontainsinformationaboutvariousimportedcars• Adda“volume”attribute(i.e.lengthxwidthxheight)• Removetheoriginallength,width,andheightattributes• Savethedatasetusingadifferentfilename

Page 18: Introduction to Orange - d37djvu3ytnwxt.cloudfront.net · Introduction to Orange ... delimited text ... descriptive attributes that are excluded from the analysis (such as a row ID)

Solution

Page 19: Introduction to Orange - d37djvu3ytnwxt.cloudfront.net · Introduction to Orange ... delimited text ... descriptive attributes that are excluded from the analysis (such as a row ID)

Solution,continued

Remembertoclick“Apply”afteryoumakechanges!