Paper-4 Analysis of Data Mining Techniques for Data

7/29/2019 Paper-4 Analysis of Data Mining Techniques for Data

1/7

International Journal of Computational Intelligence and Information Security, Jan-Feb 2013 Vol. 4, No. 1-2ISSN: 1837-7823

24

Analysis of Data Mining Techniques for DataCompression Comparison

B. Hari Kumar K .V.Kiranmai P. J ayasri Sukanya Sripathi12Sri I ndu College of Engineering & Technology

3Nova College of Engineering & Technology4ST. MarysCollege of Engineering & Technology

Abstract Data compression shows complex social groups move in regular patterns, howeverprevious work focus on distributed mining algorithm to jointly identify a group of movingobjects and patterns in network. In this paper we propose a mining machine learning techniqueswhich exploits the obtained group movement patterns to reduce the complex target data.

The compression algorithm includes clustering and classification techniques compressthe noisy data into delivered patterns that obtains the optimal solution. Moreover, we compareimage compression technique algorithms with mining the maximum compression ratio. Theanalysis results show that the proposed compression algorithm leverages the group movement

patterns to reduce the amount of target data effectively and best efficient.

Keywords Data Compression, Data Mining, Wireless Sensor, Machine Learning.

I . INTRODUCTIONIn research advances in location-acquisition technologies, such as global positioning systems(GPSs) and wireless sensor networks (WSNs), have fostered many novel applications like objecttracking, environmental monitoring, and location-dependent service [3]. These applicationsgenerate a large amount of location data, and thus, lead to transmission and storage challenges,especially in resource constrained environments like WSNs. To reduce the data volume, variousalgorithms have been proposed for data compression and data aggregation. However, the above

works do not address application-level semantics, such as the group relationships and movementpatterns, in the location data. In object tracking applications, many natural phenomena show thatobjects often exhibit some degree of regularity in their movements. Discovering the groupmovement patterns is more difficult than finding the patterns of a single object or all objects,because we need to jointly identify a group of objects and discover their aggregated groupmovement patterns. On the one hand, the temporal-and-spatial [1] correlations in the movementsof moving objects are modeled as sequential patterns in data mining to discover the frequentmovement patterns we first introduce our distributed mining algorithm to approach the movingobject clustering problem and discover group movement patterns Based on the discovered groupmovement patterns, proposed system a novel compression algorithm to tackle the group datacompression. Our distributed mining algorithm comprises a Group Movement Pattern Mining

(GMPMine) and a Cluster Ensembling (CE) [2] algorithm. It avoids transmitting unnecessaryand redundant data by transmitting only the local grouping results to a base station (the sink),instead of all of the moving objects location data. Specifically, the GMPMine algorithmdiscovers the local group movement patterns by using a novel similarity measure, while the CEalgorithm combines the local grouping results to remove inconsistency and improve the groupingquality by using the information theory. The constrained resource of WSNs should also beconsidered in approaching the moving [3] object clustering problem. This is different fromprevious works, we formulate a moving object clustering problem that jointly identifies a group


2/7


25

of objects and discovers their movement patterns. The application-level semantics are useful forvarious applications, such as data storage and transmission, task scheduling, and networkconstruction.

II. SECTION2. Related Work: Data are changing all the time especially data on the web are highly dynamic,old datasets are deleted while some other datasets are updated. It is observe that time stamp is animportant attribute of each dataset also it is important in the process of data mining and it cangive us more accurate and useful information. Here consider the example association rule miningdoes not take the time stamp into account the rule can be Buy _A >Buy _B. if we consider timestamp into account then we can get more accurate and useful rules such as Buy _A implies _Bwithin a week or usually people Buy _A every week. Other rule business organization can makemore accurate and useful prediction and consequently make more decisions. A data base consistsof sequences of values or events that change with time is called a time series database. A timeseries that records the sales transaction of a supermarket each transaction includes an extra

attribute indicate when the transaction. Time series data is widely used to store historical data ina diversity of areas such as financial data, medical data, and scientifical data. Different miningtechniques have been designed for mining time series data is to find the evolution patterns ofattributes over time they can be long term trend movements.

Suppose all the books need not to be brought at the same time or consecutively the mostimportant thing is the order in which those books are brought and they are bought by the samecustomer here 80 percent represents the percentage of customers who comply this purchasinghabit. Sequential patterns can be widely used in different areas such as mining user accesspatterns for the web sites using the history of symptoms to predict certain kind of disease byusing sequential pattern mining indicates the correlation between transactions while associationrule represents intra transaction relationships.

III. SECTION3. Problem Definition: In graphics multiple data objects are implemented to show in singlerepresentation, for these we use compression algorithms but the objects are complex difficult tocompression each object such as moving object, similar and irrelevant. Our analysis proposes themining application to compress moving objects or complex objects.

Figure 1 Proposed System for data compression


3/7


26

3.1. Classification Data Compression: Data compression can reduce the storage and energyconsumption for resource constrained applications. Distributed source coding uses joint entropyto encode two nodes data individually without sharing any data between them it requires priorknowledge of cross correlations of sources. Combine data compression with routing by

exploiting cross correlations between sensor nodes to reduce the data size.

In Classification, training examples are used to learn a model that can classify the data samplesinto known classes.The Classification process involves following steps:

a. Create training data setb. Identify class attribute and classesc. Identify useful attributes for classification (relevance analysis)d. Learn a model using training examples in training sete. Use the model to classify the unknown data samples

3.2 Clustering: In clustering given a set of objects from the entries of a distance matrix, containsthe pairwise relations in form but in this format that information is not easily usable. We need toreduce the information even further in order to achieve acceptable format like data clusters. Toextract a hierarchy of clusters form the distance matrix. Clusters are groups of objects that aresimilar according to out metric, analyse data sets for which the number of clusters is not known apriori and data are not labelled.

Clustering techniques are hierarchical and partitioning further divided into agglomerativeand divisive. Hierarchical build clusters gradually partitioning algorithms learn clusters directly,discover the clusters by iteratively relocating points between subsets or try to identify cluster asareas highly populated with data. Categorical data is intimately connected with transactionaldatabases similarity alone is not sufficient for clustering such data, it is the co-occurrence comesto rescue. Real cluster data is high dimensionality, corresponding developments are surveyedclustering high dimensional data.

IV. SECTION4. Implementation: Theoretical design is turned out into a working system thus it can beconsidered to be the most critical stage in achieving a successful new system and in giving theuser, confidence that the new system will work and be effective. The implementation stageinvolves careful planning, investigation of the existing system and its constraints onimplementation, designing of methods to achieve changeover and evaluation of changeovermethods.

Client-driven interventions: Client-driven interventions are the means to protectcustomers from unreliable services. For example, services that miss deadlines or do not respondat all for a longer time are replaced by other more reliable services in future discoveryoperations.

Provider-driven interventions: Provider-driven interventionsare desired and initiated bythe service owners to shield themselves from malicious clients. For instance, requests of clientsperforming a denial of service attack by sending multiple requests in relatively short intervals areblocked (instead of processed) by the service.


4/7


27

4.1. To implement the mining techniques for image data compressionCollaboration Partners: The demand for models to support larger-scale flexible

collaborations has led to an increasing research interest in adaptation techniques to enable andoptimize interactions between collaboration partners. They provide the means to specify well-

defined interfaces and let customers and collaboration partners use an organizations resourcesthrough dedicated operations.Service Instances:The concept of personalized provisioning is enabled by creating

dedicated service instances for each single customer of service providers. A standard service isinstantiated and gradually customized according to a clients requirements and a providersbehavior.

Interaction Model: User are not statically bound to clients but are discovered at run-time.Thus, interactions are ad-hoc and dynamically performed with often not previously knownpartners. In SOA, interactions are typically modeled as SOAP messages. Moreover, thedocument translation service might be successfully used for research papers in computer science,while it is not frequently used to translate business documents.

Adaptation Strategies:Client-driven interventionsare the means to protect customersfrom unreliable services. For example, services that miss deadlines or do not respond at all for alonger time are replaced by other more reliable services in future discovery operations.Provider-driven interventionsare desired and initiated by the service owners to shield themselvesfrom malicious clients. For instance, requests of clients performing a denial of service attack bysending multiple requests in relatively short intervals are blocked (instead of processed) by theservice.

V. SECTION

5. Comparative Study:Data transmission is more cost effort, information being dealt most digital data are not

stored in the most compact form. ASCII text from word processors binary code that can beexecuted on a computer typically some easy to use encoding methods require data files abouttwice as large as actually needed to represent the information. Problem with the uncompressionprogram returns the information to its original, observes the encoding techniques which we canapply for un compressed data. A lossless technique means that the restored data file is identicalto the original examples of lossless executable code, word processing files, etc. data files thatrepresent images and other acquired signals do not have to be keep in perfect condition forstorage or transmission. Fixed or variable is a way of classifying data compression methods,most data compression programs operate by taking a group of data from the original file. Fixednumber of bits is read from the input file and a smaller fixed of bits are written to the output file.By reducing the coding length of the data the resulting mining solutions are provably optimalachieve especially for high dimensional data such as images or gene expression data.


5/7


28

Figure 2(a) Figure 2 (b)

Figure 2(c)Input is image after compression, in figure shows the different scales of image (c) is output.

Really some images and videos have complicated mixed structure segmentation breaks the imageor video into small pieces. For face recognition where training images of human faces taken tovarying expression or lighting condition, this task is to identify which of the individuals in thetraining database is captured in the test image. Comparing to image processing for datacompression, mining techniques can achieve the target image for particular image because byusing the machine learning techniques can remove the noisy data.

CONCLUSIONIn this paper we compare the trend towards social moving objects systems with the

human user in the loop numerous concepts, including personalization, expertise compressiondrifting interests, and social dynamics become of paramount importance. Therefore, wediscussed related mining data compression standards shows a way to extend them to fit therequirements deliverables. In particular, the concepts that let human understandable offer theirexpertise in a service-oriented manner and covered the deployment, discovery and selection ofuser provided services. In the future, our aim is to provide more monitoring grained objects withevaluation strategies.

References[1] Agrawal, R. and Srikant, R. 1995. Mining sequential patterns. In Eleventh InternationalConference on Data

Engineering, P. S. Yu and A. S. P. Chen, Eds. IEEE Computer Society Press, Taipei, Taiwan, 3{14.[2] Berkhin, P. 2002. Survey of clustering data mining techniques. Tech. rep., Accrue Software, San Jose, CA.[3] Bettini, C., Wang, X. S., and Jajodia, S. 1998. Mining temporal relationships with multiple

granularities in time sequences.Data Engineering Bulletin 21, 1, 32{38.Beyer, K. and Ramakrishnan, R. 1999.Bottom-up computation of sparse and Iceberg CUBE.359{370.

[4] S. Baek, G. de Veciana, and X. Su, Minimizing Energy Consumption in Large-Scale Sensor Networks throughDistributed Data Compression and Hierarchical Aggregation, IEEE J. Selected Areas in Comm., vol. 22, no. 6,pp. 1130-1140, Aug. 2004.


6/7


29

[5] C.M. Sadler and M. Martonosi, Data Compression Algorithms for Energy-Constrained Devices in DelayTolerant Networks, Proc. ACM Conf. Embedded Networked Sensor Systems, Nov. 2006.

[6] Y. Xu and W.-C. Lee, Compressing Moving Object Trajectory in Wireless Sensor Networks, Intl J .Distributed Sensor Networks, vol. 3, no. 2, pp. 151-174, Apr. 2007.

[7] G. Shannon, B. Page, K. Duffy, and R. Slotow, African Elephant Home Range and Habitat Selection in Pongola

Game Reserve,South Africa, African Zoology, vol. 41, no. 1, pp. 37-44, Apr. 2006.[8] C. Roux and R.T.F. Bernard, Home Range Size, Spatial Distribution and Habitat Use of Elephants in Two

Enclosed GameReserves in the Eastern Cape Province, South Africa, African J. Ecology, vol. 47, no. 2, pp. 146-153, J une2009.

[9] J . Yang and M. Hu, Trajpattern: Mining Sequential Patterns from Imprecise Trajectories of Mobile Objects,Proc. 10th Intl Conf. Extending Database Technology, pp. 664-681, Mar. 2006.

B.HARI KUMAR, M.Tech Computer Science & Engineering from Sri Indu College of Engineering &

Technology, Ibrahimpatnam. B.Tech Electronics and Communication Engineering fromMahaveer I nstitute of

Science and Technology, Hyderabad, having 4+years of experience in Engineering Colleges, has guided many

UG & PG students. Currently he is working as an Asst.Professor in Brilliant Institute of Engineering &

Technology. His areas of interest include Image Processing, Information security, Web Technology, Object

Oriented Programming and Operating Systems.

SUKANYA SRIPATHI, M.Tech. Computer Science fromST. Marys College of Engineering & Technology,

CompletedB.Tech. Computer Science & Engineering fromRao & Naidu Engineering Collegehaving 3+years ofexperience in Engineering Colleges has guided many UG students. Currently she is an Asst Prof in Brilliant

Institute of Engineering & Technology; her areas of interest include Unix Operating System, Information security,

Object Oriented Analysis & Design, Computer Organization, Programming languages.


7/7


30

K.V. KIRANMAI, PursuingM.Tech. Computer Science & Engineering fromSri I ndu College of Engineering &

Technology, Ibrahimpatnam. B.Tech Computer Science and Engineering from Sri Sunflower College Of

Engineering & Technology, Lankapalli, having 2+years of experience in Engineering Colleges, has guided many

UG students. Currently she is working as an Asst Prof inBrilliant Institute of Engineering & Technology, her

areas of interest include Image Processing, Information security, Web Technology, Design and Analysis of

Algorithms.

PARUCHURI J AYASRI, Pursuing M.Tech. Computer Science & Engineering from Nova College of

Engineering & Technology,Completed B.Tech. Computer Science & Engineering fromVRS & Y RN College of

Engineering & Technologyhaving 2+years of experience in Engineering Colleges has guided many UG students.

Currently she is an Asst Prof at Brilliant Institute of Engineering & Technology; her areas of interest include

Unix Operating System, Information security, Object Oriented Analysis & Design, Design and Analysis of

Algorithms, Computer Organizations.

Paper-4 Analysis of Data Mining Techniques for Data

Documents

Transcript of Paper-4 Analysis of Data Mining Techniques for Data