os_project.doc

NANDHA ENGINEERING COLLEGE

ERODE – 638 052

(AUTONOMOUS)

Department of Computer Science and Engineering

PROJECT REPORT ON PROJECT BASED LEARNING

(OPERTING SYSTEMS)

DECEMBER 2014

This is to certify that the project entitled

SIMPLE SEARCH ENGINE

is the bonafide record of project work done by

TEAM MEMBERS:NAME: B.KAVITHAMANI

(REG.NO:13CS025)

NAME:E.KIRUTHIKA

(REG.NO:13CS029 )

NAME:R.MEENA

(REG.NO:13CSO35)

NAME:K.PRIYADHARSHINI

(REG.NO13CSL06 )

of B.E. (Computer Science and Engineering) during the year 2014-15.

ABSTRACT

CONTENT

TABLE OF CONTENTS

CHAPTER NO. TITLE PAGE NO.

ABSTRACT v

LIST OF FIGURES ix

LIST OF TABLES x

LIST OF ABBREVIATIONS xi

1 INTRODUCTION 1

1.1 Overview of the project 1

1.3 Objective of the project 5

2 PROPOSED SYSTEM 12

4.1 Advantages of Proposed System 12

3 SYSTEM ANALYSIS AND DESIGN 13

5.1 System Design 13

5.1.1 Data Flow Diagram 15

5.2 System Specification 16

5.2.1 Software Requirements 16

5.2.2 Hardware Requirements 16

6 MODULE DESCRIPTION 17

6.1 Module Description 17

7 CONCLUSION 28

7.1 Conclusion 28

8 APPENDICES 29

8.1 Screen Shots 29

8.2 Source Code 35

REFERENCES 53

INTRODUCTION

Simple search engine project is implemented in java using servlets, oracle database or SQL server 2000. Main aim of this project is to develop a search engine which will search in three different search engines and display top twenty five results which are more useful for users. In present trend search engines are used by every for finding required information on the web. Google is one of the mostly used search engine followed by yahoo and bing. For every search engine first five results are more useful website with correct information so in this project we collect top five results from google , yahoo and bing and display in first page which are most useful web pages.

While some extensions to SQL allow manual specification of attribute weights, this approach is cumbersome for most users. automated ranking of database results has been studied in the context of relational databases, and although a number of techniques perform query dependent ranking, they do not differentiate between users and hence provide a single ranking order for a given query across all users. In contrast, techniques for building extensive user profiles as well as requiring users to order data tuples.

DEFINITION:

A place, on the Net, where one goes to find sites about specific information. When you have a Web site and you want people to be able to find it you must go to the search engines and submit your site to them so they will list it.

After you submit your site to them it still may take as long as 3 months before they list it. You may need to re-submit your site every couple months depending on the rules of the search engine. Read each of their rules before submitting.

Examples:

I'm going to submit my Web site to search engines so other people can find it.

Google

Yahoo

bing

MAKING A SIMPLE SEARCH ENGINE:

We've now looked at both fopen() and fsockopen(), both of which are great for reading in content from websites. However, thanks to the way streams work in PHP, you can read remote data in with a huge selection of functions - even down to the relatively lowly file_get_contents(). To show off this functionality, I wrote a very simple search engine that spiders websites by pulling out hyperlinks and inserting data into a MySQL table. The code is very, very simple, and very naive - it's here to demonstrate a point, not be a perfect search engine, so please don't base your own efforts on it!

2.PROPOSED SYSTEM:

Advantages of proposed system:

Variety

An Internet search can generate a variety of sources for information. Results from online encyclopedias, news stories, university studies, discussion boards, and even personal blogs can come up in a basic Internet search. This variety allows anyone searching for information to choose the types of sources they would like to use, or to use a variety of sources to gain a greater understanding of a subject.

Precision

Search engines do have the ability to provide refined or more precise results. Putting quotations marks around a set of words will bring up results with the exact same words, excluding others. Some search engines, such as Google or Yahoo, enable you to specify the type of web sources to be searched. Being able to search more precisely allows you to cut down on the amount of information generated by your search.

Organization

Internet search engines help to organize the Internet and individual websites. Search engines aid in organizing the vast amount of information that can sometimes be scattered in various places on the same web page into an organized list that can be used more easily.

SYSTEM ANALYSIS AND DESIGN

SYSTEM DESIGN:

STEP 1:

STEP 2:

STEP 3:

SYSTEM SPECIFICATION:

HARDWARE REQUIREMWNTS:

SOFTWARE REQUIREMENTS:

SQL SERVER 2000:

Even in this new age of internet connectivity, online services and

wireless everywhere , architects, developers and end users are

realizing that betting their business and user productivity on a

constant connection to a central location is a risky , frustrating , and

costly endeavour.

When the connected goes down for any reason , can you affered to

have your business stop?

Building resiliency , redundancy and the ability to work

independently with in your application architecture provides the

ability to shield your user and company.

Having a local data store to cache online data , enable offline

functionality , or enable that stand alone application that just does not

need to serve data to the world asks the questions;

“what should I use for local storage?”

The SQL server family offers two products suitable for local storage :

Microsoft SQL server 2000 compact edition for desktop scenarious,

Microsoft is positioning compact edition as the default local data

base.

SOURCE CODE

#include <iostream>

#include <string>

#include <sys/types.h>

#include <errno.h>

#include <sys/stat.h>

#include <stdio.h>

#include <dirent.h>

#include <unistd.h>

#include "lib/list.h"

#include "file.h"

#include "word.h"

#include "trie.h"

#include "lib/getopt.h"

extern int yyparse();

extern void makelower(char *);

using namespace std;

extern Trie * t;

bool is_valid(string & path)

{

size_t pos = path.find_first_not_of("./");

if(pos != string::npos && pos > 0)

{

string p = path.substr(pos);

if(path.compare(pos-1, 4, "/dev") == 0 ||

path.compare(pos-1, 4, "/bin") == 0 ||

path.compare(pos-1, 8, "/usr/bin") == 0 ||

path.compare(pos-1, 4, "/usr/local/bin") == 0 ||

path.compare(pos-1, 4, "/sbin") == 0 ||

path.compare(pos-1, 4, "/usr/X11R6/bin/") == 0 ||

path.compare(pos-1, 4, "/usr/local/sbin") == 0 ||

path.compare(pos-1, 4, "/proc") == 0)

{

return false;

}

}

return true;

}

int file_count = 0;

void traverse(string & fn)

{

DIR *dir;

struct dirent *entry;

string path;

struct stat info;

if( ! is_valid(fn))

return;

if ((dir = opendir(fn.c_str())) == NULL)

{

return;

}

while ((entry = readdir(dir)) != NULL)

{

if (strcmp(entry->d_name, ".") != 0 && strcmp(entry->d_name, "..") != 0)

{

path = fn;

if(path.find_last_of("/") != path.size()-1)

path = path + "/";

path = path + entry->d_name;

if(stat(path.c_str(), &info) != 0 || S_ISLNK(info.st_mode))

continue;

if (S_ISDIR(info.st_mode))

traverse(path);

else

{

size_t pos = path.find_last_of(".");

if(pos != string::npos && (path.substr(pos) == ".txt" || path.substr(pos) == ".htm" || path.substr(pos) == ".html"))

{

file_count++;

File::parse(path);

}

}

}

}

closedir(dir);

}

bool give_count = false;

void print(Word & w)

{

cerr << "Searched for: " << w.getWord() << endl;

List<WordFile> & wf = w.getFiles();

wf.sort();

List<WordFile>::Node * wfn = wf.getFirst();

while(wfn != NULL)

{

int fnum = wfn->val->getNum();

if(give_count)

cout << "[" << wfn->val->getCount() << "] ";

cout << File::resolvePath(fnum) << endl;

wfn = wfn->next;

}

}

Word * sres = NULL;

void stem_fn(Word * w)

{

if(sres != NULL)

{

if(w != NULL)

sres = or_them(*sres, *w);

}

else

sres = w;

}

Word * s(string & line, Trie & t)

{

Word * res = NULL;

while(line != "")

{

size_t pos = line.find_first_of(" \t");

string word;

if(pos != line.npos)

{

word = line.substr(0, pos);

line = line.substr(pos+1);

}

else

{

word = line;

line = "";

}

if(word != "")

{

Word * w1 = NULL;

if(word.find_last_of("*") != string::npos)

{

word = word.substr(0, word.size()-1);

sres = NULL;

t.traverse((char*)word.c_str(), stem_fn);

w1 = sres;

sres = NULL;

}

else

w1 = t.find(word.c_str());

if(w1 == NULL)

w1 = new Word(word);

if(res != NULL)

{

if(w1 == NULL)

res = NULL;

else

res = phrase_match(*res, *w1);

}

else

res = w1;

}

}

return res;

}

int main(int argc, char **argv)

{

int errflag, option;

string f1;

string f2;

string f3;

string f4;

string dir = "";

bool search = true;

char *home;

string mss_dir;

struct stat mss_stat;

string query_string = "";

string prev = "";

errflag=0;

string index_name = "";

string query[10];

int qnum = 0;

while(((option = my_getopt(argc,argv,"hnc:i:")) != NONOPT) || optarg != NULL)

{

switch(option)

{

case 'i':

index_name = optarg;

cerr <<"Using non-default index: " << optarg <<endl;

break;

case 'c':

cerr << "Indexing directory: " << optarg << endl;

search = false;

dir = optarg;

break;

case 'n':

give_count = true;

break;

case 'h':

cerr << argv[0] << ": Index and search for words in files. Version 1.0" << endl << endl

<< "usage: "

<< argv[0] << " [-i index-name] -c directory \t\tcreate index for 'directory'; use index 'index-name'" << endl

<< " or: "

<< argv[0] << " [-i index-name] [-n] query-string\t\tsearch for 'query-string'; use index 'index-name'" << endl << endl

<< "Arguments:" <<endl

<< " -c (with directory)\t create index" << endl

<< " -i (with name)\t use non-default index" << endl

<< " -n\t\t\t display number of occurrances in search results" << endl

<< endl

<< "EXAMPLES:" << endl

<< "To index '/usr/share/doc', use" << endl

<< "\t" << argv[0] << " -c /usr/share/doc" << endl << endl

<< "To index '/home/tom' with index-name 'home', use" << endl

<< "\t" << argv[0] << " -c /home/tom -i home" << endl << endl

<< "Now, to search '/usr/share/doc' for 'linux AND perl', use" << endl

<< "\t" << argv[0] << " linux AND perl" << endl << endl

<< "Now, to search '/home/tom' for 'perl OR php', use" << endl

<< "\t" << argv[0] << " perl OR php -i 'home'" << endl << endl

<< "To search '/usr/share/doc' for the phrase \"apache php module\", use" << endl

<< "\t" << argv[0] << " \"apache php module\"" << endl << endl

<< "To search '/usr/share/doc' for 'lin*', use (observe the single quotes)" << endl

<< "\t" << argv[0] << " 'lin*'" << endl;

exit(0);

break;

case '?':

cerr << "Type " << argv[0] << " -h for help" << endl;

exit(1);

break;

case NONOPT:

// cout<< "Word " << optarg <<endl;

makelower(optarg);

if(prev != "")

{

query_string = query_string + " ";

if(prev != "and" && prev != "or" && strcmp(optarg, "and") != 0 && strcmp(optarg, "or") != 0)

{

query_string = query_string + "and ";

query[qnum] = "and";

qnum++;

}

}

string q = "";

if(strcmp(optarg, "and") != 0 && strcmp(optarg, "or") != 0)

{

string s = optarg;

size_t p = s.find_first_of(" \t");

if(s.find_first_of(" \t") != string::npos && s.find_first_of("*") != string::npos)

{

cerr << "Ignoring phrase (wildcards used): \"" << optarg << "\"" << endl;

continue;

}

if(s.find_first_of("*") < s.size()-1)

{

cerr << "Ignoring word (wildcard '*' at wrong position): " << optarg << endl;

continue;

}

if(strlen(optarg) <= 2 && s.at(s.size()-1) != '*')

{

cerr << "Ignoring word (too short): " << optarg << endl;

continue;

}

if(s.find_first_of(" \t") != s.npos)

{

while(s != "")

{

size_t pos = s.find_first_of(" \t");

string word;

if(pos != s.npos)

{

word = s.substr(0, pos);

s = s.substr(pos+1);

}

else

{

word = s;

s = "";

}

if(word.size() > 2)

{

if(q != "")

q = q + " ";

q = q + word;

}

else

{

cerr << "Ignoring word in phrase (too short): " << word << endl;

}

}

if(q == "")

continue;

}

else

q = s;

}

else if(prev == "and" || prev == "or")

{

prev = "";

continue;

}

else

q = optarg;

query_string = query_string + q;

query[qnum] = q;

qnum++;

prev = q;

break;

}

}

if(search == false && dir == "")

{

cout << "USAGE" << endl; // IMP: Print this properly

exit(1);

}

home = getenv("HOME");

mss_dir = home;

mss_dir = mss_dir + "/.mss";

if(stat(mss_dir.c_str(), &mss_stat) == 0)

{

if(S_ISDIR(mss_stat.st_mode) == 0)

{

cerr << "Error: There is a non-directory: " << mss_dir << endl << " Please delete this to continue using this application" << endl;

exit(1);

}

}

else

{

if(mkdir(mss_dir.c_str() , 0xFFF) == -1)

{

cerr << "Error: Could not create directory: " << mss_dir << endl;

exit(1);

}

}

mss_dir = mss_dir + "/";

f1 = mss_dir;

f1 = f1 + index_name;

f1 = f1 + "1.dat";

f2 = mss_dir;


f2 = f2 + "2.dat";

f3 = mss_dir;


f3 = f3 + "3.dat";

f4 = mss_dir;


f4 = f4 + "4.dat";

if(search)

{

if(query_string == "")

{

cerr << "Nothing to search for!" << endl

<< "Type " << argv[0] << " -h for help" << endl;

exit(1);

}

if(stat(f1.c_str() , &mss_stat) == -1 || stat(f2.c_str() , &mss_stat) == -1 ||

stat(f3.c_str() , &mss_stat) == -1 || stat(f4.c_str() , &mss_stat) == -1)

{

cerr << "No database exists. Run " << argv[0] <<" with -c option first. Type '"

<< argv[0] << " -h' for help" << endl;

exit(1);

}

File::init(f1 , f2);

t = new Trie(f3, f4);

Word * res = NULL;

for(int i=0; i<qnum; i++)

{

Word * wres = NULL;

if(query[i] != "and" && query[i] != "or")

{

wres = s(query[i], *t);

if(res != NULL)

{

if(query[i-1] == "and")

{

if(wres == NULL)

res = NULL;

else

res = and_them(*res, *wres);

}

else if(query[i-1] == "or" && wres != NULL)

res = or_them(*res, *wres);

}

else

res = wres;

}

}

if(res != NULL)

print(*res);

else

cerr << "Not found" << endl;

}

else

{

ofstream fout;

fout.open(f1.c_str(), ios::out | ios::binary | ios::trunc);

fout.close();


fout.close();


fout.close();


fout.close();

File::init(f1 , f2);

t = new Trie(f3, f4);

traverse(dir);

t->save();

cout << file_count << " files indexed" << endl;

File::uninit();

}

delete t;

}

os_project.doc

Documents

Transcript of os_project.doc