Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Introduction to IEEE ICDM Data Mining

Contest (ICDM DMC 2007)

[email protected]

Main Parts

• Introduction to ICDM DMC 2007

• The work of our team

Introduction to ICDM DMC 2007

• This year’contest is the first IEEE ICDM Data Mining Contest,which will be held in conjunction with the 2007 IEEE International Conference on Data Mining.

• http://www.cse.ust.hk/~qyang/ICDMDMC07/

What is the Problem?

• This year's contest is about indoor location estimation from radio signal strengths received by a client device from various WiFi Access Points (APs)

http://en.wikipedia.org/wiki/Wi-Fi

What is the AP?

• Access Points are base stations for the wireless network. They transmit and receive radio frequencies for wireless enabled devices to communicate with.

http://encyclopedia.thefreedictionary.com/Wireless+access+point

http://encyclopedia.thefreedictionary.com/Wireless+access+point

http://encyclopedia.thefreedictionary.com/wireless

http://encyclopedia.thefreedictionary.com/Computer+networking

• The client device (which can be a PDA) is equipped with a wireless card that can receive signals from many surrounding wireless access points (APs). Each of these APs is identifiable with a unique ID. Based on the collection of signal strength values (RSS values), a data mining algorithm running on the client device tries to figure out the current location of the user.

RSS Vectors

• RSS Vector = <(AP1, RSS Value1), (AP2, RSS Value2)...(AP k, RSS Value k)>

• The ID of AP is an integer between 0 and 100.• The value is also an interger between 0 and –99.• The number k is different in difference RSS• The WiFi data are very noisy due to the so-called

multi-path effect in indoor environments

Location Label

• All WiFi data are collected in 247 locations, where each location is a grid. A grid has a size of about 1.5m×1.5m.

• Location label is an integer between 1 and 247.

Task 1. Indoor Location Estimation

• All the WiFi data (training data and test data) are collected by the same device in the same time period.

• There are two types of data provided in this task:

• 1 trace data • 2 non-trace data.

Task1. trace data

Some statistical information of task1.trace data

• 40 traces • 1404 collections , 130 collections labeled• 11881 pairs of APID and value• Average 8.5 pairs of APID and value per

collection, the minimum is 1,maximum is 19

Task1. non-trace data

Some statistical information of task1.non-trace data

• 1792 collections of RSS values • 375 collections labeled• Average 8.5 pairs of APID and value per


• 15256 pairs of APID and value

Task_2_training_data

Some statistical information of Task_2_training_data

• 2322 collections of RSS values • 621 collections labeled• 2.5 collections labeled per class. Min is 1

and max is 8• Average 8.6 pairs of APID and value per


Task2 Test Dataset

Task2 Landmark Dataset

Evaluation Criterion

• For Task 1, baseline is precision=60%.

• For Task 2, baseline is precision=30%.

The algorithm of our teamfor task2

Step1:sieve out the collections labeled

Step2:Get Differences of Arbitrary Two Collections labeled

• Number of the pairs of APID – value which are only in one collection

• Sum of absolute of such RSS value above with -100

• Number of the pairs of APID – value which are in two collection

• Sum of absolute of such RSS value above• Is or is not same location, 1 is same and –1 is

not

An example

• collectionA:119 18:-96 23:-87 66:-69

• collectionB: 54 18:-94 83:-62 85:-76 86:-72 89:-85

• The Five number is 6,149,1,2,-1

Step3:Get coefficients by Linear Fitting

• e=dlmread('distance_matrix.txt');• b=e(:,5);• x=e(:,7:9);• x(:,2)=[];• [x1,y1] = find(b>0);• x_pos =x(x1,:);• b_pos=b(x1,1);• x_append = x;• b_append = b;• for i = 1:floor(length(b)/length(b_pos))• x_append=cat(1,x_append,x_pos);• b_append = cat(1,b_append,b_pos);• end• a=x_append\b_append;• c=(x*a).*b;• accuracy = sum(c>0)/length(b);• display(accuracy);

Remainder Steps:

Step4: Get centers of per class( the collections of the same location)

Step5: Testing.Our highest precision=28.30%

Thank you!

Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)

Documents

Transcript of Introduction to IEEE ICDM Data Mining Contest (ICDM DMC 2007)