Employing Web Search indexing for fast creation of filtered view of large text files Mostafa...

1
Employing Web Search indexing for fast creation of filtered Employing Web Search indexing for fast creation of filtered view of large text files view of large text files Mostafa Agbaria, Ahmad Atamlh Department of Electrical engineering, Technion Department of Electrical engineering, Technion Software System Laboratory, Spring 2010 Software System Laboratory, Spring 2010 Supervisor : Oved Itzhak, Lab Engineer : Dr .Ilana David Supervisor : Oved Itzhak, Lab Engineer : Dr .Ilana David Multi-Threaded V.S Single-Thread Single Thread Single Thread running running Page Fault ,Disk access, CPU Page Fault ,Disk access, CPU idle. idle. In ideal In ideal world world Single Single Thread Thread Thread 1 Thread 1 running running Multi Multi Threading Threading Thread 2 Thread 2 running running Thread 3 Thread 3 running running Thread 4 Thread 4 running running Tim Tim e Abstract Abstract The following figure shows the time for building the database using various number of threads (file size = 100Mb). Multi Threaded Indexing Multi Threaded Indexing In this project we plan to implement a new type of Index to the VLTFV Application that supports fast creation of filtered view of large text files using a Web Search Indexing technique. The implementation is in Microsoft .NET and C#. Creating a database using inverted indexing for pre-processing the data in the log files, by this providing the user with easy and fast way to search the log file . Project Goals Project Goals • The indexer takes more time to build the database than expected using serial parsing . • We built the database using Multi-Threading, meaning that the indexing of the file made in parallel using specific number of threads, each indexing a different part of the file, for faster indexing. • Each thread Creates new database for its section in the file Sends the database to Web Technique Searcher. • After getting all the sub-databases, we merge them into a Main Database. Summary Summary Using the plug-in that have been developed in this project make the searching and the inspecting in very large text file easier and faster and more reliable , using an Advance Algorithm based on Web indexing Technique with the use of the VLTF , making the process of the switching between lines in such large text file more practical for humans. The conventional approach previously used requires The conventional approach previously used requires going over the entire text file to perform the going over the entire text file to perform the search, which is time consuming and not practical. search, which is time consuming and not practical. This originate a pre-processing for the text file, This originate a pre-processing for the text file, it can enable us to perform a search in a faster it can enable us to perform a search in a faster and more reliable way. The index which is the pre- and more reliable way. The index which is the pre- processed database solve the problem of speed and processed database solve the problem of speed and doesn't require us for going over the entire file doesn't require us for going over the entire file and from here the save of time is gotten . and from here the save of time is gotten . Pre-Processing Data Pre-Processing Data Sub Sub Database Database Main Main Database Database 1 1 2 2 3 3 4 4 5 5 6 6 Inverted index is an index data structure storing a mapping from content, such as words or numbers, to its locations in a database file, or in a document or a set of documents. The purpose of an inverted index is to allow fast full text searches, at a cost of increased processing when a document is added to the database. Inverted Indexing Inverted Indexing User Interface User Interface Open Open File File Go to Go to Line Line Search Search Conventional Conventional Scroll Bar Scroll Bar Scroll Scroll Knob Knob Line Line Numbers Numbers Search Results Search Results Pane Pane Progress Progress Bar Bar File lines File lines counter counter Text view Text view area area In today’s Internet-scale services it’s not uncommon to have logs that contain huge amounts of data. Inspecting such logs can easily overwhelm a human. Therefore, specialized tools that make it easier to manage all the data are essential. In this project we implement a Plug-in to the existing VLTF application which takes the text file and creates an Index that enables very fast search in the file, using inverted indexing. The VLTF provides the GUI for searching and quickly navigating to the found locations in the text file. V ery L arge T ext F ile Viewer • As network bandwidth increase , network servers (e.g. Web, Mail etc) create exceedingly large log files . The problem of searching in such files resembles the Web Search problem were it is prohibitively long to search all the data simplistically. This project is continuing for VLTFV project (Very Large Text File Viewer), Application responsiveness is independent of input file size. Background Background
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    1

Transcript of Employing Web Search indexing for fast creation of filtered view of large text files Mostafa...

Page 1: Employing Web Search indexing for fast creation of filtered view of large text files Mostafa Agbaria, Ahmad Atamlh Department of Electrical engineering,

Employing Web Search indexing for fast creation of filtered view of large text filesEmploying Web Search indexing for fast creation of filtered view of large text files Mostafa Agbaria, Ahmad Atamlh

Department of Electrical engineering, TechnionDepartment of Electrical engineering, TechnionSoftware System Laboratory, Spring 2010Software System Laboratory, Spring 2010

Supervisor : Oved Itzhak, Lab Engineer : Dr .Ilana DavidSupervisor : Oved Itzhak, Lab Engineer : Dr .Ilana David

Multi-Threaded V.S Single-Thread

Single Thread runningSingle Thread running

Page Fault ,Disk access, CPU idle.Page Fault ,Disk access, CPU idle.

In ideal worldIn ideal world

Single ThreadSingle Thread

Thread 1 Thread 1 runningrunning Multi ThreadingMulti Threading

Thread 2 Thread 2 runningrunning

Thread 3 Thread 3 runningrunning

Thread 4 Thread 4 runningrunning

TimeTime

AbstractAbstract

The following figure shows the time for building the database using various number of threads (file size = 100Mb).

Multi Threaded IndexingMulti Threaded Indexing

• In this project we plan to implement a new type of Index to the VLTFV

Application that supports fast creation of filtered view of large text files

using a Web Search Indexing technique.

•  The implementation is in Microsoft .NET and C#.

• Creating a database using inverted indexing for pre-processing the data in

the log files, by this providing the user with easy and fast way to search the

log file .

Project GoalsProject Goals

• The indexer takes more time to build the database than expected using serial

parsing .

• We built the database using Multi-Threading, meaning that the indexing of the

file made in parallel using specific number of threads, each indexing a

different part of the file, for faster indexing.

• Each thread

Creates new database for its section in the file

Sends the database to Web Technique Searcher.

• After getting all the sub-databases, we merge them into a Main Database.

SummarySummary

Using the plug-in that have been developed in this project make the searching and

the inspecting in very large text file easier and faster and more reliable , using an

Advance Algorithm based on Web indexing Technique with the use of the VLTF ,

making the process of the switching between lines in such large text file more

practical for humans.

The conventional approach previously used requires going over the entire The conventional approach previously used requires going over the entire

text file to perform the search, which is time consuming and not practical. text file to perform the search, which is time consuming and not practical.

This originate a pre-processing for the text file, it can enable us to perform a This originate a pre-processing for the text file, it can enable us to perform a

search in a faster and more reliable way. The index which is the pre-search in a faster and more reliable way. The index which is the pre-

processed database solve the problem of speed and doesn't require us for processed database solve the problem of speed and doesn't require us for

going over the entire file and from here the save of time is gotten . going over the entire file and from here the save of time is gotten .

Pre-Processing DataPre-Processing Data

Sub DatabaseSub DatabaseMain Main

DatabaseDatabase

1111

2222

3333

4444

5555

6666

Inverted index is an index data structure storing a mapping from content,

such as words or numbers, to its locations in a database file, or in a document

or a set of documents. The purpose of an inverted index is to allow fast full

text searches, at a cost of increased processing when a document is added to

the database.

Inverted IndexingInverted Indexing

User InterfaceUser Interface

Open FileOpen FileOpen FileOpen File Go to LineGo to LineGo to LineGo to Line SearchSearchSearchSearch Conventional Scroll BarConventional Scroll BarConventional Scroll BarConventional Scroll Bar

Scroll KnobScroll KnobScroll KnobScroll KnobLine NumbersLine NumbersLine NumbersLine Numbers

Search Results PaneSearch Results PaneSearch Results PaneSearch Results Pane

Progress BarProgress BarProgress BarProgress BarFile lines counterFile lines counterFile lines counterFile lines counter

Text view areaText view areaText view areaText view area

In today’s Internet-scale services it’s not uncommon to have logs that contain

huge amounts of data. Inspecting such logs can easily overwhelm a human.

Therefore, specialized tools that make it easier to manage all the data are

essential.

In this project we implement a Plug-in to the existing VLTF application

which takes the text file and creates an Index that enables very fast search in

the file, using inverted indexing. The VLTF provides the GUI for searching

and quickly navigating to the found locations in the text file.

Very Large Text File Viewer

• As network bandwidth increase , network servers (e.g. Web, Mail etc)

create exceedingly large log files .

• The problem of searching in such files resembles the Web Search problem

were it is prohibitively long to search all the data simplistically.

• This project is continuing for VLTFV project (Very Large Text File

Viewer), Application responsiveness is independent of input file size.

BackgroundBackground