Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)
-
Upload
dustin-scott -
Category
Documents
-
view
256 -
download
0
description
Transcript of Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)
Main Parts
• Introduction to ICDM DMC 2007
• The work of our team
Introduction to ICDM DMC 2007
• This year’contest is the first IEEE ICDM Data Mining Contest,which will be held in conjunction with the 2007 IEEE International Conference on Data Mining.
• http://www.cse.ust.hk/~qyang/ICDMDMC07/
What is the Problem?
• This year's contest is about indoor location estimation from radio signal strengths received by a client device from various WiFi Access Points (APs)
What is the AP?
• Access Points are base stations for the wireless network. They transmit and receive radio frequencies for wireless enabled devices to communicate with.
• The client device (which can be a PDA) is equipped with a wireless card that can receive signals from many surrounding wireless access points (APs). Each of these APs is identifiable with a unique ID. Based on the collection of signal strength values (RSS values), a data mining algorithm running on the client device tries to figure out the current location of the user.
RSS Vectors
• RSS Vector = <(AP1, RSS Value1), (AP2, RSS Value2)...(AP k, RSS Value k)>
• The ID of AP is an integer between 0 and 100.• The value is also an interger between 0 and –99.• The number k is different in difference RSS• The WiFi data are very noisy due to the so-called
multi-path effect in indoor environments
Location Label
• All WiFi data are collected in 247 locations, where each location is a grid. A grid has a size of about 1.5m×1.5m.
• Location label is an integer between 1 and 247.
Task 1. Indoor Location Estimation
• All the WiFi data (training data and test data) are collected by the same device in the same time period.
• There are two types of data provided in this task:
• 1 trace data • 2 non-trace data.
Task1. trace data
Some statistical information of task1.trace data
• 40 traces • 1404 collections , 130 collections labeled• 11881 pairs of APID and value• Average 8.5 pairs of APID and value per
collection, the minimum is 1,maximum is 19
Task1. non-trace data
Some statistical information of task1.non-trace data
• 1792 collections of RSS values • 375 collections labeled• Average 8.5 pairs of APID and value per
collection, the minimum is 1,maximum is 19
• 15256 pairs of APID and value
Task_2_training_data
Some statistical information of Task_2_training_data
• 2322 collections of RSS values • 621 collections labeled• 2.5 collections labeled per class. Min is 1
and max is 8• Average 8.6 pairs of APID and value per
collection, the minimum is 2,maximum is 19
Task2 Test Dataset
Task2 Landmark Dataset
Evaluation Criterion
• For Task 1, baseline is precision=60%.
• For Task 2, baseline is precision=30%.
The algorithm of our teamfor task2
Step1:sieve out the collections labeled
Step2:Get Differences of Arbitrary Two Collections labeled
• Number of the pairs of APID – value which are only in one collection
• Sum of absolute of such RSS value above with -100
• Number of the pairs of APID – value which are in two collection
• Sum of absolute of such RSS value above• Is or is not same location, 1 is same and –1 is
not
An example
• collectionA:119 18:-96 23:-87 66:-69
• collectionB: 54 18:-94 83:-62 85:-76 86:-72 89:-85
• The Five number is 6,149,1,2,-1
Step3:Get coefficients by Linear Fitting
• e=dlmread('distance_matrix.txt');• b=e(:,5);• x=e(:,7:9);• x(:,2)=[];• [x1,y1] = find(b>0);• x_pos =x(x1,:);• b_pos=b(x1,1);• x_append = x;• b_append = b;• for i = 1:floor(length(b)/length(b_pos))• x_append=cat(1,x_append,x_pos);• b_append = cat(1,b_append,b_pos);• end• a=x_append\b_append;• c=(x*a).*b;• accuracy = sum(c>0)/length(b);• display(accuracy);
Remainder Steps:
Step4: Get centers of per class( the collections of the same location)
Step5: Testing.Our highest precision=28.30%
Thank you!