Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé...
-
Upload
rolf-hubbard -
Category
Documents
-
view
219 -
download
0
Transcript of Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé...
![Page 1: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649edc5503460f94bed031/html5/thumbnails/1.jpg)
Moving Ahead: Creative Moving Ahead: Creative Feature Extraction and Feature Extraction and
Error Analysis TechniquesError Analysis Techniques
Carolyn Penstein RosCarolyn Penstein RosééCarnegie Mellon UniversityCarnegie Mellon University
Funded through the Pittsburgh Science of Learning Center and The Office of Naval Research, Cognitive and Neural Sciences Division
![Page 2: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649edc5503460f94bed031/html5/thumbnails/2.jpg)
OutlineOutline
New Feature CreationNew Feature Creation Error AnalysisError Analysis
![Page 3: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649edc5503460f94bed031/html5/thumbnails/3.jpg)
New Feature CreationNew Feature Creation
![Page 4: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649edc5503460f94bed031/html5/thumbnails/4.jpg)
Why create new features?Why create new features?
You may want to generalize across sets of You may want to generalize across sets of related wordsrelated words Color = {red,yellow,orange,green,blue}Color = {red,yellow,orange,green,blue} Food = {cake,pizza,hamburger,steak,bread}Food = {cake,pizza,hamburger,steak,bread}
You may want to detect contingenciesYou may want to detect contingencies The text must mention both cake and The text must mention both cake and
presents in order to count as a birthday partypresents in order to count as a birthday party You may want to combine theseYou may want to combine these
The text must include a color and a foodThe text must include a color and a food
![Page 5: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649edc5503460f94bed031/html5/thumbnails/5.jpg)
Why create new features by hand?Why create new features by hand?
More likely to capture meaningful More likely to capture meaningful generalizationsgeneralizations
Build in knowledge so you can get by with Build in knowledge so you can get by with less training dataless training data
![Page 6: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649edc5503460f94bed031/html5/thumbnails/6.jpg)
Rule LanguageRule Language
ANY() is used to create listsANY() is used to create lists COLOR = ANY(red,yellow,green,blue,purple)COLOR = ANY(red,yellow,green,blue,purple) FOOD = ANY(cake,pizza,hamburger,steak,bread)FOOD = ANY(cake,pizza,hamburger,steak,bread)
ALL() is used to capture contingenciesALL() is used to capture contingencies ALL(cake,presents)ALL(cake,presents)
More complex rulesMore complex rules ALL(COLOR,FOOD)ALL(COLOR,FOOD)
![Page 7: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649edc5503460f94bed031/html5/thumbnails/7.jpg)
Group Project: Group Project: Make a rule that will match against Make a rule that will match against
questions but not statementsquestions but not statements
Question Tell me what your favorite color is.
Statement I tell you my favorite color is blue.
Question Where do you live?
Statement I live where my family lives.
Question Which kinds of baked goods do you prefer
Statement I prefer to eat wheat bread.
Question Which courses should I take?
StatementYou should take my applied machine learning course.
Question Tell me when you get up in the morning.
Statement I get up early.
![Page 8: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649edc5503460f94bed031/html5/thumbnails/8.jpg)
Possible RulePossible Rule
ANY(ALL(tell,me),BOL_WDT,BOL_WRB)ANY(ALL(tell,me),BOL_WDT,BOL_WRB)
![Page 9: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649edc5503460f94bed031/html5/thumbnails/9.jpg)
Advanced Feature EditingAdvanced Feature Editing
* Click here
![Page 10: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649edc5503460f94bed031/html5/thumbnails/10.jpg)
Types of Basic FeaturesTypes of Basic Features Primitive features Primitive features
inclulde unigrams, inclulde unigrams, bigrams, and POS bigrams, and POS bigramsbigrams
![Page 11: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649edc5503460f94bed031/html5/thumbnails/11.jpg)
Types of Basic FeaturesTypes of Basic Features The Options change The Options change
which primitive features which primitive features show up in the Unigram, show up in the Unigram, Bigram, and POS bigram Bigram, and POS bigram listslists You can choose to remove You can choose to remove
stopwords or notstopwords or not You can choose whether or You can choose whether or
not to strip endings off not to strip endings off words with stemmingwords with stemming
You can choose how You can choose how frequently a feature must frequently a feature must appear in your data in appear in your data in order for it to show up in order for it to show up in your listsyour lists
![Page 12: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649edc5503460f94bed031/html5/thumbnails/12.jpg)
Types of Basic FeaturesTypes of Basic Features
* Now let’s look at how to createnew features.
![Page 13: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649edc5503460f94bed031/html5/thumbnails/13.jpg)
Creating New FeaturesCreating New Features
*The feature editor allows you to createnew feature definitions
* Click on + to add your new feature
![Page 14: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649edc5503460f94bed031/html5/thumbnails/14.jpg)
Examining a New FeatureExamining a New Feature
•Right click on a feature toexamine where it matches inyour data
![Page 15: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649edc5503460f94bed031/html5/thumbnails/15.jpg)
Examining a New FeatureExamining a New Feature
![Page 16: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649edc5503460f94bed031/html5/thumbnails/16.jpg)
Error AnalysisError Analysis
![Page 17: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649edc5503460f94bed031/html5/thumbnails/17.jpg)
Create an Error Analysis FileCreate an Error Analysis File
![Page 18: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649edc5503460f94bed031/html5/thumbnails/18.jpg)
Use TagHelper to Code Uncoded Use TagHelper to Code Uncoded FileFile
•The output file containsthe codes TagHelperassigned.
•What you want to do now is to remove prediction column and insert the correct answers next tothe TagHelper assignedanswers.
![Page 19: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649edc5503460f94bed031/html5/thumbnails/19.jpg)
Load Error Analysis FileLoad Error Analysis File
![Page 20: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649edc5503460f94bed031/html5/thumbnails/20.jpg)
Load Error Analysis FileLoad Error Analysis File
![Page 21: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649edc5503460f94bed031/html5/thumbnails/21.jpg)
Error Analysis StrategiesError Analysis Strategies
Look for large error cells in the confusion Look for large error cells in the confusion matrixmatrix
Locate the examples that correspond to Locate the examples that correspond to that cellthat cell
What features do those examples share?What features do those examples share? How are they different from the examples How are they different from the examples
that were classified correctly?that were classified correctly?
![Page 22: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649edc5503460f94bed031/html5/thumbnails/22.jpg)
Group ProjectGroup Project
Load in the NewsGroupTrain.xls data setLoad in the NewsGroupTrain.xls data set What is the best performance you can get by playing What is the best performance you can get by playing
with the standard TagHelper tools feature options?with the standard TagHelper tools feature options? Train a model using the best settings and then Train a model using the best settings and then
use it to assign codes to NewsGroupTest.xlsuse it to assign codes to NewsGroupTest.xls Copy in Answer column from Copy in Answer column from
NewsGroupAnswers.xlsNewsGroupAnswers.xls Now do an error analysis to determine why Now do an error analysis to determine why
frequent mistakes are being madefrequent mistakes are being made How could you do better?How could you do better?
![Page 23: Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649edc5503460f94bed031/html5/thumbnails/23.jpg)