Employing Web Search indexing for fast creation of filtered view of large text files Mostafa...

Employing Web Search indexing for fast creation of filtered view of large text filesEmploying Web Search indexing for fast creation of filtered view of large text files Mostafa Agbaria, Ahmad Atamlh

Department of Electrical engineering, TechnionDepartment of Electrical engineering, TechnionSoftware System Laboratory, Spring 2010Software System Laboratory, Spring 2010

Supervisor : Oved Itzhak, Lab Engineer : Dr .Ilana DavidSupervisor : Oved Itzhak, Lab Engineer : Dr .Ilana David

Multi-Threaded V.S Single-Thread

Single Thread runningSingle Thread running

Page Fault ,Disk access, CPU idle.Page Fault ,Disk access, CPU idle.

In ideal worldIn ideal world

Single ThreadSingle Thread

Thread 1 Thread 1 runningrunning Multi ThreadingMulti Threading

Thread 2 Thread 2 runningrunning



TimeTime

AbstractAbstract

The following figure shows the time for building the database using various number of threads (file size = 100Mb).

Multi Threaded IndexingMulti Threaded Indexing

• In this project we plan to implement a new type of Index to the VLTFV

Application that supports fast creation of filtered view of large text files

using a Web Search Indexing technique.

• The implementation is in Microsoft .NET and C#.

• Creating a database using inverted indexing for pre-processing the data in

the log files, by this providing the user with easy and fast way to search the

log file .

Project GoalsProject Goals

• The indexer takes more time to build the database than expected using serial

parsing .

• We built the database using Multi-Threading, meaning that the indexing of the

file made in parallel using specific number of threads, each indexing a

different part of the file, for faster indexing.

• Each thread

Creates new database for its section in the file

Sends the database to Web Technique Searcher.

• After getting all the sub-databases, we merge them into a Main Database.

SummarySummary

Using the plug-in that have been developed in this project make the searching and

the inspecting in very large text file easier and faster and more reliable , using an

Advance Algorithm based on Web indexing Technique with the use of the VLTF ,

making the process of the switching between lines in such large text file more

practical for humans.

The conventional approach previously used requires going over the entire The conventional approach previously used requires going over the entire

text file to perform the search, which is time consuming and not practical. text file to perform the search, which is time consuming and not practical.

This originate a pre-processing for the text file, it can enable us to perform a This originate a pre-processing for the text file, it can enable us to perform a

search in a faster and more reliable way. The index which is the pre-search in a faster and more reliable way. The index which is the pre-

processed database solve the problem of speed and doesn't require us for processed database solve the problem of speed and doesn't require us for

going over the entire file and from here the save of time is gotten . going over the entire file and from here the save of time is gotten .

Pre-Processing DataPre-Processing Data

Sub DatabaseSub DatabaseMain Main

DatabaseDatabase

1111

2222

3333

4444

5555

6666

Inverted index is an index data structure storing a mapping from content,

such as words or numbers, to its locations in a database file, or in a document

or a set of documents. The purpose of an inverted index is to allow fast full

text searches, at a cost of increased processing when a document is added to

the database.

Inverted IndexingInverted Indexing

User InterfaceUser Interface

Open FileOpen FileOpen FileOpen File Go to LineGo to LineGo to LineGo to Line SearchSearchSearchSearch Conventional Scroll BarConventional Scroll BarConventional Scroll BarConventional Scroll Bar

Scroll KnobScroll KnobScroll KnobScroll KnobLine NumbersLine NumbersLine NumbersLine Numbers

Search Results PaneSearch Results PaneSearch Results PaneSearch Results Pane

Progress BarProgress BarProgress BarProgress BarFile lines counterFile lines counterFile lines counterFile lines counter

Text view areaText view areaText view areaText view area

In today’s Internet-scale services it’s not uncommon to have logs that contain

huge amounts of data. Inspecting such logs can easily overwhelm a human.

Therefore, specialized tools that make it easier to manage all the data are

essential.

In this project we implement a Plug-in to the existing VLTF application

which takes the text file and creates an Index that enables very fast search in

the file, using inverted indexing. The VLTF provides the GUI for searching

and quickly navigating to the found locations in the text file.

Very Large Text File Viewer

• As network bandwidth increase , network servers (e.g. Web, Mail etc)

create exceedingly large log files .

• The problem of searching in such files resembles the Web Search problem

were it is prohibitively long to search all the data simplistically.

• This project is continuing for VLTFV project (Very Large Text File

Viewer), Application responsiveness is independent of input file size.

BackgroundBackground

http://en.wikipedia.org/wiki/Index_(information_technology)

Employing Web Search indexing for fast creation of filtered view of large text files Mostafa...

Documents

Transcript of Employing Web Search indexing for fast creation of filtered view of large text files Mostafa...