Recitation for BigData Jay Gu Jan 10 HW1 preview and Java Review.
-
Upload
derick-mccormick -
Category
Documents
-
view
212 -
download
0
Transcript of Recitation for BigData Jay Gu Jan 10 HW1 preview and Java Review.
Recitation for BigData
Jay GuJan 10
HW1 preview and Java Review
Outline
• HW1 preview• Review of java basics• An example of gradient descent for linear
regression in Java
HW1 Preview
On ~1 million size data.
• Warm up exercise
• Stochastic Gradient Descent for Logistic Regression
• SGD with Hashing Kernel
• Extra credit: Personalized Logistic Regression
Starter Code
–Class for parsing the input file and iterate over the dataset.
Dataset dataset = new Dataset(your_path, is_training, size)While(dataset.hasNext()) {
DataInstance d = dataset.next();… some action on d …
}
Starter Codepublic class DataInstance {
int clicks; // number of clicks, -1 if it is testing data.int impressions; // number of impressions, -1 if it is testing data.
// Feature of the sessionint depth; // depth of the session.int[] query; // List of token ids in the query field
// Feature of the ad….
// Feature of the user….
}
Starter Codepublic class Weights {
double w0;/* * query.get("123") will return the weight for the feature: * "token 123 in the query field". */Map<Integer, Double> query;Map<Integer, Double> title;Map<Integer, Double> keyword;Map<Integer, Double> description;double wPosition;double wDepth;double wAge;double wGender;
}
BigData is often sparse
Be as lazy as you can …
Update only when necessary…
Avoid O(d): Sparse and lazy update
• Although the feature space d is huge, each data point only has a few tokens.– Only update what is changed.
• But even so, regularization should be applied to all d weights at each step.– Delay and batch the regularization.
Java Review
Not required but good to know:Interface, Inheritance, Access Modifier,
I/O,…
• Language: Class, Object, variable, method• Data Structure: Java Collections– Array– List : ArrayList– Map: HashMap
Classpublic class DataInstance {
// Feature of the sessionint[] query ….// Feature of the adint[] title …DataInstance(String line, … ) {
// parse the line, and set the field}
public void print() {System.out.println( “title: “);for (int token : title)
System.out.print(token + “\t”);}
}
Members or fields
Constructor
Method
Object
• DataInstance data = new DataInstance();
• int clicked = data.clicked
• data.print()
Collections
• Array– int[] tokens– double[] weights
• ArrayList– ArrayList<DataInstance>
• HashMap– HashMap<K, V>
Fixed Length, Most compact
Dynamically Increasing (double the size every time)
Constant time key value look upDynamically Increasing, use more memory
Variables
• “Everything” in Java is an Object– Except for primitive types : int, double
• All object variables are reference/pointers to the Object
• Function passes variables by value
Example: SGD for linear regression
• Demo