© 2014 Persontyle Ltd. All rights reserved.
CLUSTER ANALYSIS WITH R
[cluster analysis divides data into groups that are meaningful, useful, or both]
LEARNING STAGE – ADVANCED
DURATION – 3 DAY
www.persontyle.com© 2014 Persontyle Ltd. All rights reserved.
Cluster Analysis or Clustering is the study of methods and algorithms for
finding groups in data. It is an enormously important part of data science –
and a topic always treated in data mining and machine learning. Clustering
methods can be found in areas as disparate as customer segmentation,
recommender systems, drug compound library design, risk modeling, fraud
detection, gene expression studies, field biology, text mining, … the list is
nearly endless. Clustering is an essential part of predictive modeling
methodology, from data exploration and hypothesis generation about the
classes or structure of data to actually being part of some predictive
modeling tasks.
Some Applications of Cluster Analysis;
WHAT IS CLUSTER ANALYSIS?
• Market researchers and analysts use cluster analysis to partition the
general population of consumers into market segments and to better
understand the relationships between different groups of
potential customers, and for use in market segmentation, Product
positioning, and new product development.
• Clustering is used to group all the shopping items available on the web into
a set of unique products.
• In the study of social networks, clustering is used to
recognize communities within large groups of people.
• Cluster analysis is used to identify areas where there are greater
incidences of particular types of crime. By identifying these distinct areas or
"hot spots" where a similar crime has happened over a period of time, it is
possible to manage law enforcement resources more effectively.
• Flickr's map of photos and other map sites use clustering to reduce the
number of markers on a map.
www.persontyle.com© 2014 Persontyle Ltd. All rights reserved.
R logo is trademark of the R Foundation, from http://www.r-project.org
This course presents a broad overview of Cluster Analysis, a form of
unsupervised machine learning that is used for exploratory data
analysis, data summation, ordination, and even predictive modelling.
This course will provide an in depth review of both clustering theory and
application across a large spectrum of disciplines and applied settings,
from drug discovery to management science.
Clustering topics, such as issues with data types, measures of
similarity, and clustering algorithms and their taxonomy, will be
additionally explored in the form of a hands-on labs with the use of the
R programming language.
Participants will come away with information and a set of tools that will
form the basis for an approach to the use of Cluster Analysis for
clustering problems in their respective domain.
Cluster Analysis forms an important area of statistical learning theory,
both as an independent discipline of unsupervised learning and a
sometimes subdomain within predictive modelling and supervised
learning.
CLUSTER ANALYSIS WITH R 3 day course for professionals and researchers interested in
developing practical skills on how to implement clustering
algorithms using R.
“If we can get usable, flexible, dependable machine learning software into the hands of domain experts,
benefits to society are bound to follow.”Dr Kiri L. Wagstaff, researcher at NASA JPL
WHAT WILL YOU LEARN?
This course will proceed such that participants will learn and explore by
way of simulated and practical examples in R: the general concerns of
data and data types used in clustering, measures of similarity (including
notions of “distance” and “metric”), theoretical foundation of
clustering, data summation, ordination, connection to data mining and
prediction, clustering approaches in the form of an informal taxonomy,
algorithm complexity, relevant graph theory, specific clustering algorithms
(model-based, hierarchical, partitional, graphical, hybrids, co-clusteing,
asymmetric clustering, online clustering), visualization of various forms of
clustering results, and clustering validation, parallelism.
Participants will learn about the data preparation and various clustering
algorithms and visualization methods with the help of the following R and
R clustering packages. Examples will include real world applications in
drug discovery, bioinformatics, social media, management science,
finance, ecology, and others. Specific applications will include drug
compound library design and diversity, gene expression, community
detection, customer segmentation, species ordination, QSAR (Quantitative
Structure Activity Relationship), among others. Participants will come away
from the course with the tools and applied understanding necessary to
approach a large array of clustering problems in their domain.
www.persontyle.com© 2014 Persontyle Ltd. All rights reserved.
R logo is trademark of the R Foundation, from http://www.r-project.org
CLUSTER ANALYSIS WITH R
PREREQUISITES
Participants should have at least passing familiarity with the following
topics: probability theory, statistics, matrix algebra, and programming in R.
www.persontyle.com© 2014 Persontyle Ltd. All rights reserved.
R logo is trademark of the R Foundation, from http://www.r-project.org
CLUSTER ANALYSIS WITH R
VIRTUOUS CIRCLE OF LEARNING
Learning outcomes combine theory, overview of concepts and practices,
applied examples from real world and implementation (Hands-on Labs).
Time allocated to each topic will drive the depth and coverage of that topic.
WHAT SHOULD I BRING?
Along with bringing your laptop and a charger, don’t forget to bring loads
of curiosity, scepticism, eagerness to participate and the desire to learn.
WHO SHOULD TAKE THIS COURSE?
DATA SCIENTISTSQUANTITATIVE PROFESSIONALS
TECHNOLOGISTS/ PROGRAMMERS
DATA/MARKET ANALYSTS
RESEARCH ANALYSTS
This course is intended for those who are currently working as data analysts, programmers, market researchers with limited exposure to clustering techniques and algorithms as well as those looking to move into the field.
COURSE INSTRUCTORS
Today’s Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these,
developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data.
Kevin P. Murphy - Research Scientist at Google
www.persontyle.com© 2014 Persontyle Ltd. All rights reserved.
R logo is trademark of the R Foundation, from http://www.r-project.org
CLUSTER ANALYSIS WITH R
John MacCuish
John MacCuish is a founder and President of Mesa Analytics & Computing, Inc. and a
computer scientist with over 20 years of experience as a researcher, algorithm designer,
and data scientist in applied settings. John has published numerous journal articles,
books, successful grant applications, patents, and technical reports on graph theory,
algorithm animation, scientific visualization, image processing, cheminfomatics,
bioinformatics, and data mining. He also wrote or contributed to many internal and
confidential reports on fraud detection, image recognition, precision agriculture,
economic modeling, queuing theory models, financial risk modeling, text mining, and
drug discovery. He is a recognized expert in cluster analysis, designing algorithms and
implementing original software for clustering solutions in the field of early drug
discovery. John has a Distinguished Performance Award from Los Alamos National
Laboratory for his work on the IRS Fraud Detection Project.
Dr. Norah MacCuish
Dr. Norah MacCuish received her Ph.D. from Cornell University in the field of
Theoretical Physical Chemistry. Her twenty years’ experience in pharmaceutical and
software companies has given her expertise in the areas of diversity assessment for
compound acquisitions, combinatorial chemistry library design, Chemical information
systems use and design, both in basic drug discovery research and software
development. She was awarded a Bronze Impact award for her collaborative work
involving a Smith Kline Pharmaceutical Partnership. Norah has numerous publications
and has made scientific presentations in the areas of fluid simulations, chemical
diversity analysis, object-relational database systems, and chemical cluster
analysis. She was the principal Investigator for the two Phase I NSF SBIR grants, as
well as a Phase II NSF SBIR titled “Cheminformatics Teaching Tools for the
Cheminformatics Virtual Classroom.”
THE SCHOOL OF DATA SCIENCE The School of Data Science, a project of Persontyle, specializes in designing and delivering
structured, relevant and practical learning experiences for all of us to understand data science in
simple human terms.
RETURN ON INVESTMENT (ROI) CONVINCE YOUR BOSS
The advent of the data driven connected era means that analyzing massive
scale, messy, noisy, and unstructured data is going to increasingly form part
of everyone's work.
The School of Data Science learning programs provide a unique investment
opportunity that pays for itself many times over.
For corporate bookings or to organize on-site training email
[email protected] or call now +44 (0)20 3239 3141
www.persontyle.com/school
World-class
Instructors Develop Practical
Data Science
Skills
Real World
Industry Use
Cases
Short Courses
For Time
Convenience
Value For
Money
Register Now
Follow us on Twitter @schooltds
Like us on Facebook
Get in touch! [email protected]
Limited seats. We encourage you to register as soon as you can.
"For the best return on your money, pour
your purse into your head."Benjamin Franklin
Top Related