Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown...
Transcript of Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown...
![Page 1: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/1.jpg)
Data-driven Clustering via Parameterized Lloyds Families
Travis DickJoint work with Maria-Florina Balcan and Colin White
Carnegie Mellon UniversityNeurIPS 2018
![Page 2: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/2.jpg)
Data-driven Clustering
![Page 3: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/3.jpg)
Data-driven Clustering• Clustering aims to divide a dataset into self-similar clusters.
![Page 4: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/4.jpg)
Data-driven Clustering• Clustering aims to divide a dataset into self-similar clusters.• Goal: find some unknown natural clustering.
![Page 5: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/5.jpg)
Data-driven Clustering• Clustering aims to divide a dataset into self-similar clusters.• Goal: find some unknown natural clustering.
![Page 6: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/6.jpg)
Data-driven Clustering• Clustering aims to divide a dataset into self-similar clusters.• Goal: find some unknown natural clustering.
![Page 7: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/7.jpg)
Data-driven Clustering• Clustering aims to divide a dataset into self-similar clusters.
• However, most clustering algorithms minimize a clustering cost function.
• Goal: find some unknown natural clustering.
![Page 8: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/8.jpg)
Data-driven Clustering• Clustering aims to divide a dataset into self-similar clusters.
• However, most clustering algorithms minimize a clustering cost function.
• Goal: find some unknown natural clustering.
• Hope that low-cost clusterings recover the natural clusters.
![Page 9: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/9.jpg)
Data-driven Clustering• Clustering aims to divide a dataset into self-similar clusters.
• However, most clustering algorithms minimize a clustering cost function.
• Goal: find some unknown natural clustering.
• Hope that low-cost clusterings recover the natural clusters.• There are many algorithms and many objectives.
![Page 10: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/10.jpg)
Data-driven Clustering• Clustering aims to divide a dataset into self-similar clusters.
• However, most clustering algorithms minimize a clustering cost function.
How do we choose the best algorithm for a specific application?
• Goal: find some unknown natural clustering.
• Hope that low-cost clusterings recover the natural clusters.• There are many algorithms and many objectives.
![Page 11: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/11.jpg)
Data-driven Clustering• Clustering aims to divide a dataset into self-similar clusters.
• However, most clustering algorithms minimize a clustering cost function.
How do we choose the best algorithm for a specific application?
Can we automate this process?
• Goal: find some unknown natural clustering.
• Hope that low-cost clusterings recover the natural clusters.• There are many algorithms and many objectives.
![Page 12: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/12.jpg)
Learning Model
![Page 13: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/13.jpg)
Learning Model• An unknown distribution ! over clustering instances.
![Page 14: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/14.jpg)
Learning Model• An unknown distribution ! over clustering instances.• Given a sample "#, … , "& ∼ ! annotated by their target clusterings.
![Page 15: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/15.jpg)
Learning Model• An unknown distribution ! over clustering instances.
• Find an algorithm " that produces clusterings similar to the target clusterings.
• Given a sample #$, … , #' ∼ ! annotated by their target clusterings.
![Page 16: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/16.jpg)
Learning Model• An unknown distribution ! over clustering instances.
• Find an algorithm " that produces clusterings similar to the target clusterings.
• Given a sample #$, … , #' ∼ ! annotated by their target clusterings.
• Want " to also work well for new instances from !!
![Page 17: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/17.jpg)
Learning Model• An unknown distribution ! over clustering instances.
• Find an algorithm " that produces clusterings similar to the target clusterings.
• Given a sample #$, … , #' ∼ ! annotated by their target clusterings.
• Want " to also work well for new instances from !!
• In this work:1. Introduce large parametric family of clustering algorithms, (*, +)-Lloyds.
![Page 18: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/18.jpg)
Learning Model• An unknown distribution ! over clustering instances.
• Find an algorithm " that produces clusterings similar to the target clusterings.
• Given a sample #$, … , #' ∼ ! annotated by their target clusterings.
• Want " to also work well for new instances from !!
• In this work:1. Introduce large parametric family of clustering algorithms, (*, +)-Lloyds.2. Efficient procedures for finding best parameters on a sample.
![Page 19: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/19.jpg)
Learning Model• An unknown distribution ! over clustering instances.
• Find an algorithm " that produces clusterings similar to the target clusterings.
• Given a sample #$, … , #' ∼ ! annotated by their target clusterings.
• Want " to also work well for new instances from !!
• In this work:1. Introduce large parametric family of clustering algorithms, (*, +)-Lloyds.2. Efficient procedures for finding best parameters on a sample.3. Generalization: optimal parameters on sample are nearly optimal on !.
![Page 20: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/20.jpg)
Lloyds Method
![Page 21: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/21.jpg)
Lloyds Method• Maintains ! centers "#, … , "& that define clusters.
![Page 22: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/22.jpg)
Lloyds Method• Maintains ! centers "#, … , "& that define clusters.• Perform local search to improve the !-means cost of the centers.
![Page 23: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/23.jpg)
Lloyds Method• Maintains ! centers "#, … , "& that define clusters.• Perform local search to improve the !-means cost of the centers.
1. Assign each point to nearest center.
![Page 24: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/24.jpg)
Lloyds Method• Maintains ! centers "#, … , "& that define clusters.• Perform local search to improve the !-means cost of the centers.
1. Assign each point to nearest center.2. Update each center to be the mean of assigned points.
![Page 25: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/25.jpg)
Lloyds Method• Maintains ! centers "#, … , "& that define clusters.• Perform local search to improve the !-means cost of the centers.
1. Assign each point to nearest center.2. Update each center to be the mean of assigned points.3. Repeat until convergence.
![Page 26: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/26.jpg)
Lloyds Method• Maintains ! centers "#, … , "& that define clusters.• Perform local search to improve the !-means cost of the centers.
1. Assign each point to nearest center.2. Update each center to be the mean of assigned points.3. Repeat until convergence.
![Page 27: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/27.jpg)
Lloyds Method• Maintains ! centers "#, … , "& that define clusters.• Perform local search to improve the !-means cost of the centers.
1. Assign each point to nearest center.2. Update each center to be the mean of assigned points.3. Repeat until convergence.
![Page 28: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/28.jpg)
Lloyds Method• Maintains ! centers "#, … , "& that define clusters.• Perform local search to improve the !-means cost of the centers.
1. Assign each point to nearest center.2. Update each center to be the mean of assigned points.3. Repeat until convergence.
![Page 29: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/29.jpg)
Lloyds Method• Maintains ! centers "#, … , "& that define clusters.• Perform local search to improve the !-means cost of the centers.
1. Assign each point to nearest center.2. Update each center to be the mean of assigned points.3. Repeat until convergence.
![Page 30: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/30.jpg)
Lloyds Method• Maintains ! centers "#, … , "& that define clusters.• Perform local search to improve the !-means cost of the centers.
1. Assign each point to nearest center.2. Update each center to be the mean of assigned points.3. Repeat until convergence.
![Page 31: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/31.jpg)
Initial Centers are Important!
![Page 32: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/32.jpg)
Initial Centers are Important!• Lloyd’s method can get stuck if initial centers are chosen poorly
![Page 33: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/33.jpg)
Initial Centers are Important!• Lloyd’s method can get stuck if initial centers are chosen poorly• Initialization is a well-studied problem with many proposed procedures (e.g., !-means++)
![Page 34: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/34.jpg)
Initial Centers are Important!• Lloyd’s method can get stuck if initial centers are chosen poorly• Initialization is a well-studied problem with many proposed procedures (e.g., !-means++)• Best method will depend on properties of the clustering instances.
![Page 35: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/35.jpg)
The (", $)-Lloyds Family
![Page 36: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/36.jpg)
The (", $)-Lloyds FamilyInitialization: Parameter "
![Page 37: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/37.jpg)
The (", $)-Lloyds FamilyInitialization: Parameter "• Use &'-sampling (generalizing &(-sampling of )-means++)
![Page 38: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/38.jpg)
The (", $)-Lloyds FamilyInitialization: Parameter "• Use &'-sampling (generalizing &(-sampling of )-means++)• Choose initial centers from dataset * randomly.
![Page 39: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/39.jpg)
The (", $)-Lloyds FamilyInitialization: Parameter "• Use &'-sampling (generalizing &(-sampling of )-means++)• Choose initial centers from dataset * randomly.• Probability that point + ∈ * is center -. is proportional to & +, -/, … , -.1/ '.
![Page 40: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/40.jpg)
The (", $)-Lloyds FamilyInitialization: Parameter "• Use &'-sampling (generalizing &(-sampling of )-means++)
" = 0: random initialization
• Choose initial centers from dataset , randomly.• Probability that point - ∈ , is center /0 is proportional to & -, /1, … , /031 '.
![Page 41: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/41.jpg)
The (", $)-Lloyds FamilyInitialization: Parameter "• Use &'-sampling (generalizing &(-sampling of )-means++)
" = 0: random initialization " = 2: )-means++
• Choose initial centers from dataset - randomly.• Probability that point . ∈ - is center 01 is proportional to & ., 02, … , 0142 '.
![Page 42: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/42.jpg)
The (", $)-Lloyds FamilyInitialization: Parameter "• Use &'-sampling (generalizing &(-sampling of )-means++)
" = 0: random initialization " = 2: )-means++ " = ∞: farthest first
• Choose initial centers from dataset . randomly.• Probability that point / ∈ . is center 12 is proportional to & /, 13, … , 1253 '.
![Page 43: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/43.jpg)
The (", $)-Lloyds FamilyInitialization: Parameter "• Use &'-sampling (generalizing &(-sampling of )-means++)
" = 0: random initialization " = 2: )-means++ " = ∞: farthest first
Local search: Second parameter $ tweaks the local search. Details in paper.
• Choose initial centers from dataset . randomly.• Probability that point / ∈ . is center 12 is proportional to & /, 13, … , 1253 '.
![Page 44: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/44.jpg)
The (", $)-Lloyds FamilyInitialization: Parameter "• Use &'-sampling (generalizing &(-sampling of )-means++)
" = 0: random initialization " = 2: )-means++ " = ∞: farthest first
Local search: Second parameter $ tweaks the local search. Details in paper.
Question: For a distribution . over tasks, what parameters give best performance?
• Choose initial centers from dataset / randomly.• Probability that point 0 ∈ / is center 23 is proportional to & 0, 24, … , 2364 '.
![Page 45: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/45.jpg)
Results
![Page 46: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/46.jpg)
ResultsEfficient Tuning on Sample:
![Page 47: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/47.jpg)
ResultsEfficient Tuning on Sample:• Efficient algorithm for finding parameters on sample with best agreement to targets.
![Page 48: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/48.jpg)
ResultsEfficient Tuning on Sample:• Efficient algorithm for finding parameters on sample with best agreement to targets.• “Algorithmically feasible to tune parameters on sample.”
![Page 49: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/49.jpg)
ResultsEfficient Tuning on Sample:
Generalization Guarantee:
• Efficient algorithm for finding parameters on sample with best agreement to targets.• “Algorithmically feasible to tune parameters on sample.”
![Page 50: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/50.jpg)
ResultsEfficient Tuning on Sample:
Generalization Guarantee:
• Efficient algorithm for finding parameters on sample with best agreement to targets.• “Algorithmically feasible to tune parameters on sample.”
• Analyze the intrinsic complexity of (", $)-Lloyds
![Page 51: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/51.jpg)
ResultsEfficient Tuning on Sample:
Generalization Guarantee:
• Efficient algorithm for finding parameters on sample with best agreement to targets.• “Algorithmically feasible to tune parameters on sample.”
• Analyze the intrinsic complexity of (", $)-Lloyds• Show that need only roughly &' ( )*+ ,
-. clustering instances to ensure empirical cost for all parameters within / of expected cost.
![Page 52: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/52.jpg)
ResultsEfficient Tuning on Sample:
Generalization Guarantee:
• Efficient algorithm for finding parameters on sample with best agreement to targets.• “Algorithmically feasible to tune parameters on sample.”
• Analyze the intrinsic complexity of (", $)-Lloyds• Show that need only roughly &' ( )*+ ,
-. clustering instances to ensure empirical cost for all parameters within / of expected cost.
• “Parameters tuned on the sample will work well for new instances!”
![Page 53: Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown distribution !over clustering instances. •Find an algorithm "that produces clusteringssimilar](https://reader036.fdocuments.us/reader036/viewer/2022071019/5fd332cd3d3aa10d493a56ba/html5/thumbnails/53.jpg)
ResultsEfficient Tuning on Sample:
Generalization Guarantee:
Experiments: Evaluate (", $)-Lloyds family on real and synthetic data.CIFAR-10 MNIST Mixture of Gaussians CNAE-9
• Efficient algorithm for finding parameters on sample with best agreement to targets.• “Algorithmically feasible to tune parameters on sample.”
• Analyze the intrinsic complexity of (", $)-Lloyds• Show that need only roughly &' ( )*+ ,
-. clustering instances to ensure empirical cost for all parameters within / of expected cost.
• “Parameters tuned on the sample will work well for new instances!”