The How and Why of Feature Engineering
-
Upload
alice-zheng -
Category
Science
-
view
6.443 -
download
4
Transcript of The How and Why of Feature Engineering
![Page 1: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/1.jpg)
1
The How and Why of Feature Engineering
Alice Zheng, DatoMarch 29, 2016Strata + Hadoop World, San Jose
![Page 2: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/2.jpg)
2
My journey so far
Shortage of expertise andgood tools in the market.
Applied machine learning/data science
Build ML tools
Write a book
![Page 3: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/3.jpg)
3
Machine learning is great!
Model data.Make predictions.Build intelligent
applications.Play chess and go!
![Page 4: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/4.jpg)
4
The machine learning pipeline
I fell in love the instant I laid my eyes on that puppy. His big eyes and playful tail, his soft furry paws, …
Raw data
FeaturesModels
Predictions
Deploy inproduction
![Page 5: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/5.jpg)
5
If machine learning were hairstyles
Images courtesy of “A visual history of ancient hairdos” and “An animated history of 20th century hairstyles.”
ModelsMagnificent, ornate, high-maintenance
Feature engineeringStreet smart, ad-hoc, hacky
![Page 6: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/6.jpg)
6
Making sense of feature engineering• Feature generation• Feature cleaning and transformation• How well do they work?• Why?
![Page 7: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/7.jpg)
Feature GenerationFeature: An individual measurable property of a phenomenon being observed.
⎯ Christopher Bishop, “Pattern Recognition and Machine Learning”
![Page 8: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/8.jpg)
8
Representing natural text
It is a puppy and it is extremely cute.
What’s important? Phrases? Specific words? Ordering?
Subject, object, verb?
Classify: puppy or not?
Raw Text
{“it”:2, “is”:2, “a”:1, “puppy”:1, “and”:1, “extremely”:1, “cute”:1 }
Bag of Words
![Page 9: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/9.jpg)
9
Representing natural text
It is a puppy and it is extremely cute.
Classify: puppy or not?
Raw Text Bag of Wordsit 2
they 0
I 0
am 0
how 0
puppy 1
and 1
cat 0
aardvark 0
cute 1
extremely 1
… …
Sparse vector representation
![Page 10: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/10.jpg)
10
Representing images
Image source: “Recognizing and learning object categories,” Li Fei-Fei, Rob Fergus, Anthony Torralba, ICCV 2005—2009.
Raw image: millions of RGB triplets,one for each pixel
Classify: person or animal?Raw Image Bag of Visual Words
![Page 11: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/11.jpg)
11
Representing imagesClassify: person or animal?Raw Image Deep learning features
3.29-15
-5.2448.31.3647.1
-1.9236.52.8395.4-19-89
5.0937.8
Dense vector representation
![Page 12: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/12.jpg)
12
Representing audioRaw Audio Spectrogram
features
Classify: Music or voice?Type of instrument
t=0 t=1 t=2
6.1917 -0.3411 1.2418
0.2205 0.0214 0.4503
1.0423 0.2214 -1.0017
-0.2340 -0.0392 -0.2617
0.2750 0.0226 0.1229
0.0653 0.0428 -0.4721
0.3169 0.0541 -0.1033
-0.2970 -0.0627 0.1960
Time series of dense vectors
![Page 13: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/13.jpg)
13
Feature generation for audio, image, text
I fell in love the instant I laid my eyes on that puppy. His big eyes and playful tail, his soft furry paws, …
“Human native”Conceptually abstract
Low Semantic content in dataHigh
Higher Difficulty of feature generationLower
![Page 14: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/14.jpg)
Feature Cleaning and Transformation
![Page 15: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/15.jpg)
15
Auto-generated features are noisyRank Word Doc Count Rank Word Doc Count
1 the 1,416,058 11 was 929,703
2 and 1,381,324 12 this 844,824
3 a 1,263,126 13 but 822,313
4 i 1,230,214 14 my 786,595
5 to 1,196,238 15 that 777,045
6 it 1,027,835 16 with 775,044
7 of 1,025,638 17 on 735,419
8 for 993,430 18 they 720,994
9 is 988,547 19 you 701,015
10 in 961,518 20 have 692,749
Most popular words in Yelp reviews dataset (~ 6M reviews).
![Page 16: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/16.jpg)
16
Auto-generated features are noisyRank Word Doc Count Rank Word Doc Count
357,480 cmtk8xyqg 1 357,470 attractif 1
357,479 tangified 1 357,469 chappagetti 1
357,478 laaaaaaasts 1 357,468 herdy 1
357,477 bailouts 1 357,467 csmpus 1
357,476 feautred 1 357,466 costoso 1
357,475 résine 1 357,465 freebased 1
357,474 chilyl 1 357,464 tikme 1
357,473 cariottis 1 357,463 traditionresort 1
357,472 enfeebled 1 357,462 jallisco 1
357,471 sparklely 1 357,461 zoawan 1
Least popular words in Yelp reviews dataset (~ 6M reviews).
![Page 17: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/17.jpg)
17
Feature cleaning• Popular words and rare words are not helpful• Manually defined blacklist – stopwords
a b c d e f g h iable be came definitely each far get had ieabout became can described edu few gets happens ifabove because cannot despite eg fifth getting hardly ignoredaccording become cant did eight first given has immediatelyaccordingly becomes cause different either five gives have inacross becoming causes do else followed go having inasmuch… … … … … … … … …
![Page 18: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/18.jpg)
18
Feature cleaning• Frequency-based pruning
![Page 19: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/19.jpg)
19
Stopwords vs. frequency filters
No training required
Stopwords Frequency filters
Can be exhaustive
Inflexible
Adapts to data
Also deals with rare words
Needs tuning, hard to control
Both require manual attention
![Page 20: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/20.jpg)
20
Tf-Idf: Automatic “soft” filter• Tf-idf = term frequency x inverse document
frequency• Tf = Number of times a terms appears in a
document• Idf = log(# total docs / # docs containing word w)• Large for uncommon words, small for popular words• Discounts popular words, highlights rare words
![Page 21: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/21.jpg)
21
Visualizing bag-of-words
puppy
cat
2
11
have
I have a puppy
I have a catI have a kitten
I have a dogand I have a pen
1
![Page 22: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/22.jpg)
22
Visualizing tf-idf
puppy
cat
2
11
have
I have a puppy
I have a catI have a kitten
idf(puppy) = log 4idf(cat) = log 4idf(have) = log 1 = 0
I have a dogand I have a pen
1
![Page 23: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/23.jpg)
23
Visualizing tf-idf
puppy
cat1
have
tfidf(puppy) = log 4tfidf(cat) = log 4tfidf(have) = 0
I have a dogand I have a pen,I have a kitten
1
log 4
log 4
I have a cat
I have a puppy
![Page 24: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/24.jpg)
24
Algebraically, tf-idf = column scalingw1 w2 w3 w4 … wM
d1
d2
d3
d4
…
dN
idf = log (N/L0 norm of word column)
![Page 25: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/25.jpg)
25
w1 w2 w3 w4 … wM
d1
d2
d3
d4
…
dN
Algebraically, tf-idf = column scaling
Multiply word column with scalar (idf of word)
![Page 26: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/26.jpg)
26
w1 w2 w3 w4 … wM
d1
d2
d3
d4
…
dN
Algebraically, tf-idf = column scaling
Multiply word column with scalar (idf of word)
![Page 27: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/27.jpg)
27
Algebraically, tf-idf = column scalingw1 w2 w3 w4 … wM
d1
d2
d3
d4
…
dN
Multiply word column with scalar (idf of word)
![Page 28: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/28.jpg)
28
Other types of column scaling• L2 scaling = divide column by L2 norm
![Page 29: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/29.jpg)
How well do they work?
![Page 30: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/30.jpg)
30
Classify reviews using logistic regression• Classify business category of Yelp reviews• Bag-of-words vs. L2 normalization vs. tf-idf• Model: logistic regression
![Page 31: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/31.jpg)
31
Observations• l2 regularization made no difference (with proper
tuning)• L2 normalization made no difference on accuracy• Tf-idf did better, but barely• But they are both column scaling methods! Why
the difference?
![Page 32: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/32.jpg)
A Peek Under the Hood
![Page 33: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/33.jpg)
33
Linear classification
Feature 2
Feature 1
Find the best line to separate two classes
![Page 34: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/34.jpg)
Algebraically–solve linear systems
Data matrix
Weight vector
Labels
![Page 35: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/35.jpg)
How a matrix works
Any matrixLeft
singular vectors
Singular values Rightsingularvectors
![Page 36: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/36.jpg)
How a matrix works
Any matrix
Project
ScaleProject
![Page 37: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/37.jpg)
How a matrix works
![Page 38: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/38.jpg)
Null space
Singular value = 0
Null space = part of the input space that is squashed by the matrix
![Page 39: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/39.jpg)
Column space
Singular value ≠ 0
Column space = the non-zero part of the output space
![Page 40: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/40.jpg)
Effect of column scaling
Scaled columns
![Page 41: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/41.jpg)
Effect of column scaling
Scaled columns Singular values change(but zeros stay zero)
Singular vectors may also change
![Page 42: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/42.jpg)
42
Effect of column scaling• Changes the singular values and vectors, but not
the rank of the null space or column space• … unless the scaling factor is zero
- Could only happen with tf-idf• L2 scaling improves the condition number
(therefore the solver converges faster)
![Page 43: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/43.jpg)
43
Mystery resolved• Tf-idf can emphasize some columns while zeroing
out others—the uninformative features• L2 normalization makes all features equal in “size”
- Improves the condition number of the matrix- Solver converges faster
![Page 44: The How and Why of Feature Engineering](https://reader034.fdocuments.us/reader034/viewer/2022051404/587f5bb81a28ab0d378b75cd/html5/thumbnails/44.jpg)
44
Take-away points• Many tricks for feature generation and
transformation• Features interact with models, making their effects
difficult to predict• But so much fun to play with!• New book coming out: Mastering feature
engineering- More tricks, intuition, analysis
@RainyData