Lee McCluskey, room 3/10 Email [email protected] scom.hud.ac.uk/scomtlm/cha2555
-
Upload
sebastian-mathis -
Category
Documents
-
view
37 -
download
0
description
Transcript of Lee McCluskey, room 3/10 Email [email protected] scom.hud.ac.uk/scomtlm/cha2555
AI Week 15Machine Learning:Data Mining :Association Rule Mining, Associative Classification,Applications
Lee McCluskey, room 3/10Email [email protected]
http://scom.hud.ac.uk/scomtlm/cha2555/
Last Week
Data Mining --as inducing rule classifiers from classified training examples.
Artform Research Group
Association Rule Mining(ARM)This is an “unsupervised learning activity” - briefly,
looking for strong associations between features in data.
Definitions: A transactional database is a set of “transactions” eg the details of individual sales.
A transaction can be though of as an “item-set” where each item is an attribute-value
{height=6, temp = 20. weather = warm} As a special case we could have nominal item sets{bread, cheese, milk}
Artform Research Group
Association Rule Mining(ARM): Important Definitions
An association rule is an expression
X => Ywhere X, Y are item-sets, and
The support of an association rule is defined as the proportion of transactions in the database that contain
X U Y. The confidence of an association rule is defined as the
probability that a transaction contains Y given that it contains X, that is
= no of transactions containing (X U Y) / no of transactions containing X
Artform Research Group
Aims of ARM Given a transactional database D, the association rule
problem is to find all rules that have supports and confidences greater than certain user-specified thresholds, denoted by minimum support (MinSupp) and minimum confidence (MinConf), respectively.
The aim is the discovery of the most significant associations between the items in a transactional data set. This process involves primarily the discovery of so called frequent item-sets, i.e. item-sets that occurred in the transactional data set above MinSupp and MinConf.
Artform Research Group
Example A trader deals in the following currencies in a series of 8 transactions…1 Sterling Yen Dollar Euro2 Dollar Euro Rand Sterling Ruble3 Pesos Euro Ruble Rupee Yen4 Rupee Sterling Ruble Euro Dollar5 Sterling Dinars Rand Yen6 Pesos Kroner Sterling Dollar7 Ruble Rupee Kroner Sterling Pesos8 Dollar Euro SterlingWhat is the SUPPORT and CONFIDENCE of the following rules?{Ruble } → {Rupee}{Sterling, Euro} → {Ruble} {Sterling, Euro} → {Ruble, Pesos}
Find an association rule from the set of transactions that has - at least 2 items in its antecedents, - better support and better confidence than both rules above.
Artform Research Group
Example Sterling Yen Dollar Euro
Sterling Yen Dollar Euro Sterling Yen Dollar Euro
Pesos Euro Ruble Rupee Yen
Rupee Sterling Ruble Euro Dollar
Sterling Dinars Rand Yen
Dollar Euro Rand Sterling Ruble
Pesos Kroner Sterling Dollar
Ruble Rupee Kroner Sterling Pesos
Dollar EuroSterling
X
X u YRX => Y:Ruble => Rupee
Artform Research Group
Associative ClassificationIf we fuse ARM and classification rule mining we get
“Associative Classification” – use the association technique, but learning about particular items or item sets.
Associative Classification is a branch in data mining that combines classification and association rule mining. In other words, it utlises association rule discovery methods in classification data sets.
Typically:• Find Association Rules using ARM• Sift out the “Class Association Rules” – ones that have the
class of interest on their Right Hand Sides
Validation in Rule Discovery• Multi-stage Data Mining “pipelines” are fraught with
various kinds of errors / bias• the integrity of the data at each stage of the DM
process and the reliability of the results are particularly important.
• DM usually uses “cross validation”, where the data is split into a training set and a testing set, and the results of the data miner applied to the training set is compared to the training set. Not really applicable to rule discovery.
Key idea: Look for trends/associations in the data that are output from the process and that represent known associations in the application domain.
DM Application 1: Discovering trends from patient data in the area of Diabetic Retinopathy
Diabetic Retinopathy: Basically damage to the eyes caused by Diabetes, sometimes leading to blindness
HUGE problem as diabetes on the increase. If you are a long term diabetic then your are very likely suffer some retina damage
Clinics keep large amounts of data on patients who are treated in various ways, over long periods of time.
Diabetic Retinopathy ApplicationData of 20,000 patients over 18 years Much data cleaning and inference precedes mining –
replacing missing values, noise, anomalies etc Focus in one a smaller number of patients with a yearly
screening (- timestamp) over a period of 4+ yearsAttribute Examples (there are several hundred)Age_at_Exam , Present_Treatment, calculated_age_at_diagnosis, Retinopathy_in_R_Eye (RE_RET),Retinopathy_in_L_Eye (RE_RET),calculated_diabetes_type,calculated_diabetes_duration
Trend MiningItem-sets that have an
increasing support over a series of time-stamped instances (events) are called “emerging patterns”
The changing support for sets of items during each event can indicate trends in the data. For example, the presence of a particular treatment over a period of time may lead to the alleviation of a symptom.
Diabetic Retinopathy ApplicationAim - to find trends in the data e.g. (ficticous
example):calculated_diabetes_duration > Y &Age_at_Exam in [60,70] &Present_Treatment = drugX &calculated_age_at_diagnosis in [50,60] => Retinopathy_in_R_Eye (RE_RET) = lowRetinopathy_in_L_Eye (RE_RET) =lowIncreasing trend .. “people who have had diabetes for a certain length of time, whose age
is in there 60’s, who were diagnosed in their 50’s, who have been taking treatmentX, often have low DR levels”
Increasing trend adds support for the association.
Artform Research Group
DM Application 1: Road Traffic Control
Artform Research Group
Example in Road Traffic Control
Artform Research Group
Example in Road Traffic ControlData ..Numeric Data Record from individual CARS(date, time, position, actual speed, expected speed)Textual Data of INCIDENTS(date, time start, time cleared, position, severity, road type,
area, incident category, cause, road-effect, traffic-effect, reporter ..)
Data Sources ..ANPR, Mobile Phones, Road (Vehicle) Sensors,
Environment Sensors
Artform Research Group
Applications in Road Traffic Control• associations between variations in speeds with
near-future incidents • effect of a particular type of incident (eg
roadworks) on average speeds on nearby trunk roads
• looking for predictors in "heavy/slow traffic" incidents: look for associations with speed variations or accidents on roads downstream from the incident position (hence causing the incident)
• looking for associations between speeds around a bypass and a later "heavy traffic" incident within the town bypassed
Artform Research Group
ConclusionsData Mining is a powerful set of techniques
to help discover hidden knowledge
It can be supervised or unsupervised.
• Association Rule Mining• Associative Classification
are important classes of technique used in DM