Web Crawler

112
CHAPTER 1 INTRODUCTION 1.1. PROJECT OVERVIEW Web crawler forms an integral part of any search engine. The basic task of a crawler is to fetch pages, parse them to get more URLs, and then fetch these URLs to get even more URLs. In this process crawler can also log these pages or perform several other operations on pages fetched according to the requirements of the search engine. Most of these auxiliary tasks are orthogonal to the design of the crawler itself. The explosive growth of the web has rendered the simple task of crawling the web non-trivial. With this rapid increase in the search space, crawling the web is becoming more difficult day by day. But all is not lost, newer computational models are being introduced to make resource intensive tasks more manageable. The price of computing is decreasing monotonically. It has now become very economical to use several cheap computation units in distributed fashion to achieve high throughputs. The challenge while using a distributed model such as one described above, is to efficiently distribute the computation tasks avoiding overheads for synchronization and maintenance of consistency. 1

description

web crawler used to search online

Transcript of Web Crawler

Page 1: Web Crawler

CHAPTER 1

INTRODUCTION

1.1. PROJECT OVERVIEW

Web crawler forms an integral part of any search engine. The basic task of a crawler is to

fetch pages, parse them to get more URLs, and then fetch these URLs to get even more

URLs. In this process crawler can also log these pages or perform several other operations on

pages fetched according to the requirements of the search engine. Most of these auxiliary

tasks are orthogonal to the design of the crawler itself.

The explosive growth of the web has rendered the simple task of crawling the web non-

trivial. With this rapid increase in the search space, crawling the web is becoming more

difficult day by day. But all is not lost, newer computational models are being introduced to

make resource intensive tasks more manageable.

The price of computing is decreasing monotonically. It has now become very economical to

use several cheap computation units in distributed fashion to achieve high throughputs. The

challenge while using a distributed model such as one described above, is to efficiently

distribute the computation tasks avoiding overheads for synchronization and maintenance of

consistency.

Scalability is also an important issue for such a model to be usable.

In this project, design architecture of a scalable, distributed web crawler has been proposed

and implemented . It has been designed to make use of cheap resources and tries to remove

some of the bottleneck of the present crawlers in novel way. For sake of simplicity and focus,

we only worked on the crawling part of the crawler, logging only the URL’s. Other functions

can be easily integrated to the design.

1

Page 2: Web Crawler

1.2. OBJECTIVES OF THE PROJECT

The objective of the project is to improve is to allow the user to store the website and the

links on his system and analyze the result. The project will also helps in finding the broken

link in any website. Our main objectives during the development of projects were:

Increased resource utilization (by multithreaded programming to increase

concurrency).

Effective distribution of crawling tasks with no central bottleneck.

Easy portability.

Limiting the request load for all the web servers.

Configurability of the crawling tasks

Besides catering to these capabilities our design also includes probabilistic hybrid search

model. This is done using a probabilistic hybrid of stack and queue ADTs (Abstract Data

Type) for maintaining the pending URL lists. Details of the probabilistic hybrid model are

presented later in the project. This distributed crawler is a peer-to-peer distributed

crawler, with no central entity.

By using a distributed crawling model we have overcome the bottlenecks like:

Network throughput

Processing capabilities

Database capabilities

Storage capabilities.

A database capability bottleneck is avoided by dividing the URL space into disjoint sets,

each of which is handled by a separate crawler. Each crawler parses and logs only the

URLs that lie in its URL space subset, and forwards rest of the URL to corresponding

crawler entity. Each crawler will have a prior knowledge of the look up table relating

each URL subset to [IP:PORT] combination identifying all the crawler threads

2

Page 3: Web Crawler

CHAPTER 2

LITERATURE REVIEW

The crawler system consists of a number of crawler entities, which run on distributed sites

and interact in peer-to-peer fashion. Each crawler entity has the knowledge to its URL subset,

as well as mapping from URL subset to network address of corresponding peer crawler

entity. Whenever the crawler entity encounters a URL from a different URL subset, it is

forwarded to the appropriate peer crawler entity based on URL subset to crawler entity

lookup. Each crawler entity maintains its own database, which only stores the URL’s from

the URL subset assigned to the particular entity. The database’s are disjoint and can be

combined offline when the crawling task is complete.

CRAWLER ENTITY

Each crawler entity consists of several of crawler threads, a URL handling thread, a URL

packet dispatcher thread and URL packet receiver thread. The URL set assigned to each

crawler entity will be further divided into subsets for each crawler thread. Each crawler

thread has its own pending URL list. Each thread picks up an element from URL pending

list, generates an HTTP fetch requests, gets the page, parses through this page to extracts

any URL’s in it and finally puts them in the job pending queue of the URL handling thread.

During initialization URL handling thread reads the hash to [IP:PORT] mapping. It also has

a job queue. This thread gets a URL from the job queue, checks to see if the URL belongs to

the URL set corresponding to the crawler entity. It does so based on the last few bits of the

hash of the domain name in the URL with conjunction of hash to [IP:PORT] mapping.

If the URL belongs to another entity it will put the URL on the dispatcher queue and get a

new URL from its job queue. If the URL belongs to its set, it firsts checks the URL-seen

cache, if the test fails it queries the URL database to check if the URL has been seen, and

puts the URL in the URL database. It then puts the URL into URL pending list of one of the

crawler threads.

3

Page 4: Web Crawler

URLs are assigned to a crawler thread based on domain names. Each domain name will only

be serviced by one thread; hence only one connection will be maintained with any given

server. This will make sure that the crawler doesn’t overload a slow server.

A different hash is used while distributing jobs in between the crawler thread and while

determining the URL subset. The objective behind this to isolate the two operations such that

there is no correlation between a crawler entity and the thread that is assigned to it; thus

balancing the load evenly within the threads. The decision to divide URL space on the bases

to domain names was based on the observation that a lot of pages on the web tend to have

links to pages in the same domain name. Hence if all URL’s with a particular domain name

will lie in the same URL space, these URL’s will not be needed to be forwarded to other

crawler entities. Thus this scheme provides and effective strategy to divide the crawl task

between different peer-to-peer nodes of this distributed system. We validate this argument in

our experiments described in Section 7. URL dispatcher thread communicates the URL’s

corresponding crawler entity. A URL receiver thread collects the URL’s received from other

crawler entities i.e. communicated via dispatcher threads of those crawler entities and puts

them on the job queue of the URL handling thread.

4

Page 5: Web Crawler

THE IMPLEMENTATION

The system was implemented in Java platform for portability reasons. MySQL was used for

the URL database. Even though Java is less efficient than other languages that can be

compiled to the native machine code and none of the team members were proficient with it,

we selected Java for this prototype. The reasons behind this decision were to keep the

software architecture modular, make the system portable, and to deal with complexity of

such a system. In retrospect this turned to be a good decision as we might not have been able

to complete this project in time if we would have implemented it in other languages

such as C.

The comprehensive libraries provided with Java us to concentrate our efforts on design of the

system and software architecture. A java class was written for each of the various

components of the system ( i.e. different kind of threads, database, synchronized job queues,

caches etc.). First we wrote generic classes for various infrastructure components of the

system like synchronized job queues and caches. The LRUCache class implements an

approximate LRU cache based of hash table with overlapping buckets. The JobQueue class

implements a generic synchronized job queue with option for probabilistic hybrid of stack

and queue ADT.

The main Crawler class performs the initialization, by reading the configuration files,

spawning various threads accordingly and initializing various job queues. It then behaves as

the Handler Thread. A class named CrawlerThread performs the operation of the Crawler

Thread. This thread simply gets a URL from its job queue, messages the URLlist class with

this URL. The URLlist class then spawns a new thread that fetches the page, parses it for

URL links and returns the list of these URL’s back to the CrawlerThread.

In java the URL fetch operation is not guaranteed to return and in case of a malicious web

server the whole thread can possibly hang, waiting for the operation to complete. This is the

reason why the URLlist class spawns a new thread every time to fetch the URL. The thread is

completed with a certain time-out, hence if the URL fetch operation isn’t completed in time

the thread stops after time-out and normal operation is resumed. Spawning a new thread to

fetch each page does put an extra overhead on the operation but is essential for the robustness

of the system.

5

Page 6: Web Crawler

The Sender and Receiver classes implement the Sender and Receiver threads respectively.

The Receiver class starts a UDP socket at pre-determine port and waits for any packet. The

Sender class transmits the URL’s via UDP packet to appropriate remote node. Besides the

classes that form the system architecture described before, we added a Probe Thread to the

system and a Measurement class.

The relevant classes report the appropriate measurements to the Measurements class and the

Probe Threads messages the Measurement class to output the measurements at configurable

periodic time intervals.

In this project a group computers are used to implement the distributed crawler. Every

node in the computer has its maximum capacity of storing a number of sites. While

using any site the user will select the IP Address of the the target machine and a shared

location. On clicking search button the content will be downloaded on the remote

machin. The user is also having the choice of saving the file into local drive if the

remoter computer is not available.

6

Page 7: Web Crawler

CHAPTER 3

SYSTEM ANALYSIS

3.1 IDENTIFICATION OF NEEDS

Information Retrieval is the area of computer science concerned with retrieving information

about a subject from a collection of data objects. This is not the same as Data Retrieval,

which in the context of documents consists mainly in determining which documents of a

collection contain the keywords of a user query. Information Retrieval deals with satisfying a

user need. Although there was an important body of Information Retrieval techniques

published before the invention of the World Wide Web, here are unique characteristics of the

Web that made them unsuitable or insufficient.

The low cost of publishing in the "open Web" is a key part of its success, but implies that

searching information on the Web will always be inherently more difficult then searching

information in traditional, closed repositories.

The typical design of search engines is a "cascade", in which a Web crawler creates a

collection which is indexed and searched. Most of the designs of search engines consider the

Web crawler as just a first stage in Web search, with little feedback from the ranking

algorithms to the crawling process. This is a cascade model, in which operations are executed

in strict order: first crawling, then indexing, and then searching. An aim of this approach is to

provide the crawler with access to all the information about the collection to guide the

crawling process effectively. This can be taken one step further, as there are tools available

for dealing with all the possible interactions between the modules of a search engine,

3.2 PRELIMINARY INVESTIGATION

Requirement Determination is the heart of System Analysis, aimed at acquiring a detailed

description of all important areas of business that is under investigation. So the whole

business process is studied. Many fact finding techniques are available for the requirements

determinations. Some of them are given below:

7

Page 8: Web Crawler

Existing documentation, forms, file and records

Research and site visits

Observation of the work environment

Questionnaires

Interviews and group work sessions.

From the above mentioned techniques following techniques are used in the project

‘Web Crawler using Distributed Links’ for the requirements determination.

Existing documentation, forms, file and records

Research and site visits

Questionnaires

3.2.1 Existing Documentation, Forms, File and Records: Existing information is

absolutely essential for organization. These documents provide information about the

present happenings in the organization. These documents help in knowing what

happens right now and a better system can be developed only after the correct

understanding of the current system. So the following documents are collected for the

study of the current system.

Website crawling procedures

Format of topics

Available protocols for storing the information

List of common questions asked by the users

The above documents provide information about the forms and the reports to be built

and the type of information to be stored.

3.2.2 Research and the Site visits: This is also fact finding technique it means studying

the application and the problem area. In this project many industries have been visited

to find out the answer of some common questions such as:

What is the working of Google?

What are various algorithms available for Crawling?

Performance evaluation of various algorithm.

8

Page 9: Web Crawler

3.2.3 Questionnaires: It is a document prepared for a special purpose that allows the

analysts to collect information and opinions from a number of respondents.

It contains a list of questions. The questionnaires are distributed to the selected

respondent’s answer the questions in their own time and return questions to the

analysts with the answers. The analyst can then analyze the responses and then reach

at the conclusions.

Sample questionnaires used in the ‘Web Crawler using Distributed Links’

What are the services provided by the organization?

What is the specific data of company?

How the records are maintained by users?

How the tests are conducted?

Some of the sample layouts of the reports.

What are the outputs of system?

3.2.4 Personal Interviews: There are always two roles in the personal interview. The

analyst is the interviewer who is responsible for organizing and conducting the

interview. The other role is interviewee who is the end-user or the manager or the

decision maker. The interviewee is asked a number of questions by the interviewer.

In this project the interviews of company head, department heads and employee are

conducted to ascertain their expectation to the system.

3.3 FEASIBILITY STUDY:

Feasibility Study is an important part of the Preliminary Investigation because only feasible

projects go to development stages. A very basic feasibility study for the current project is

given below:

3.3.1 Technical Feasibility: Technical feasibility raises questions like, is it possible that

the work can be done with the current equipment; software technology is required

what the possibility that it can be developed is?

9

Page 10: Web Crawler

In case of this project it fully supports windows XP/2000 but its lacks the support for

windows 98 and lower version. Also the front end tools and the back end tools for the

development of this project are also available. In this project SWING, Servlets has

been used as front end while the MySQL is used as the back end. Both the softwares

are easily available.

Thus it can be concluded that the project is technical feasible.

3.3.2 Economic Feasibility: It deals with economical impacts of the system on the

environment it is used, i.e., benefits in creating the system.

In case of this project it will save the precious time of recording the same data again

and again. The software is also designed to reduce the time and cost during the

calculation of critical data. The security provided by the software is an additional

benefit.

Thus it can be concluded that the project is economically feasible

3.3.3 Operational Feasibility: It deals with the user friendliness of the system, i.e., wills

the system be used if it is developed and implemented? Or will there be resistance

from the users?

In case of this project care has been taken to make this project highly user friendly so

that a person having only a little knowledge of English can handle it. By the way on-

line as well as special help programs which help in training the user are also built.

Thus the project is operationally feasible.

3.3.4 Legal Feasibility: This type of feasibility evaluates whether out project breaks any

law or not. According to the analysis, this project doesn’t break any laws. So, it is

legally feasible.

10

Page 11: Web Crawler

CHAPTER 4

SOFTWARE SPECIFICATION

SOFTWARES USED

There were many technologies available for the development of the project. For example for

the front-end development Visual Basic 6, power Builder, X-Windows, Visual Basic.NET,

Oracle Developer 2000, VC++ and Jbuilder. And for the back end Oracle, Ingress, Sybase,

SQL Plus, MY SQL etc. But among these technologies SWING & SERVLET is selected as

Front End tool and MySQL is used as Back End because of the following reasons.

4.1 REASONS FOR THE SELECTION OF SWING & SERVLET

SWING & SERVLET is a Website development technology that has been developed

by Sun Microsystems. It is a powerful programming language to develop

sophisticated web application very quickly. In Java everything is Object Oriented. All

items, even variables, are objects in Java

SWING provide direct integration of Java Code in HTML, that allow the user to

develop websites efficientyly and effectively, apart from this Java is platform

independent and can run on any server.

SWING also provides the support of AJAX that enables the user to partially refresh

the web pages. Programmer can done this with the help of some pre-defined controls.

Thus Java enables the programmer to build efficient websites.

Java is an object oriented programming language, so it allows the project using

features of real world entities like class, objects, encapsulation, abstraction and

inheritance. So it allows a programmer to build a more robust and scalable

application.

SWING supports the use of HTML, CSS and Java Script and a set of pre-defined

classes in the form of JDBC that can be used to access and update databases.

11

Page 12: Web Crawler

4.2 REASONS FOR THE SELECTION OF MySQL

MySQL is one of widely used Back End Tools for developing the application software. It’s

gaining the popularity due to the following reasons.

Updating the database.

Retrieving information from the database.

Accepting query language statements.

Enforcing security specifications

Managing data sharing.

Optimizing queries.

Managing system catalogs.

MySQL provides the following advantage for both clients and servers:-

Client Advantages:

Easy to use.

Supports multiple hardware platforms.

Supports multiple software applications

Familiar to the user

Server Advantages:

Reliable

Concurrent

Sophisticated locking

Fault tolerant

That’s why MySQL is selected as a Back End tool.

Apart from the above mentioned reasons relevant experience in SWING, SERVLET

and MySQL Server made to select them as front end and back end tools for developing

the project.

12

Page 13: Web Crawler

CHAPTER 5

SYSTEM SPECIFICATION

5.1. HARDWARE REQUIREMENTS

The project ‘Web Crawler’ requires following hardwares for its successful implementation.

HARDWARE

Processor : Dual Core or AboveRAM : 512 MB or aboveHard Disk : 10 GB or AboveMonitor : TFT or LCDInternet Connection : Broadband Connection

5.2. SOFTWARE REQUIREMENTS

The project ‘Web Crawler’ System’ requires following hardwares for its successful implementation.

SOFTWARE

Operating System : Windows 7 or aboveProgramming Language : Java, Java SwingsVisual Studio IDE : NetBeans IDE 7.2Database : MySQLConnector : MySQL – Java Connector

13

Page 14: Web Crawler

CHAPTER 6

PROJECT DESCRIPTION

SOFTWARE REQUIREMENT SPECIFICATION

Based on the System Analysis described in last few pages a complete Software Requirement

Specification can be prepared which is described below:

6.1 INTRODUCTION

Purpose: The purpose of the software is to provide system support to the users in

storing the web pages of a website by performing a series of crawl operations upto the

given level. Theese pages will be stored in a folder and user can reference these pages

for further study.

Scope: The software would be of great importance for a company. Although the

software is specially designed for the companies but it could be individually used by

any organization of institute to provide offline study of the webpages.

Benefits:The project will automatically navigate through the pages, generate records,

save a lot of bandwidth, allow offline study of webpages.

6.2 OVERALL DESCRIPTION

Product Description: The product is named Web Crawler. The system is going to be

developed using the technologies like Servlet, AWT, Swings and MySQL.

Product Functioning: The client will be able to store frequently visited webpages on

his local hard disk. The raw data is then verified and finally a set of operations are to

be performed. For example for user database a new user can be added, existing user

can be removed, or the password can be changed.

14

Page 15: Web Crawler

Functions of the Project: There are six major function of the software

a) User Verification

b) Upload Raw Data

c) Validate Data

d) Use Validate Data

e) Take Input From The User

f) Save Data Again

Users of the product: There will be five major users of the software:

a) Owner of the company

b) Course Administrators

c) Students

d) Employees working in a company

6.3 SPECIFIC REQUIREMENTS

Interface Requirements: The interface requirement includes: easy to follow

interface, very few graphics, relevant error message, and proper linking of forms,

proper validation etc.

Hardware Requirements: The hardware requirement for the project:

a) Pentium- IV or higher Processor

b) 40 GB Hard disk

c) 512 MB RAM

d) Printer

e) Color Display Monitor

Software Requirements: The hardware requirement for the project:

a) Linux or Windows XP Service Pack 2 or above

15

Page 16: Web Crawler

b) Net Beans 7 or above

c) MySQL 5.1 or above

d) MS – OFFICE

Logical Database Requirements: The following information is to be stored in the

databases.

a) The user data

b) Login Data

c) Website Data

d) Crawled Page Data

6.4. APPENDICES

a) Software Engineering Paradigm Applied

b) Context Free Diagrams

c) E- R Diagrams

d) Data Flow Diagrams

e) Data Dictionary

f) Diagrams Of The Tables Relationship

16

Page 17: Web Crawler

CHAPTER 7

PROJECT DESCRIPTION

7. 1 CONTEXT FREE DIAGRAM

0TH LEVEL DIAGRAM

Context free diagram shows the working of any system in only one process. The DFD for the Web Crawler is given below

Request Request Response Response

Request Response

Fig 4.1- 0 Level Data Flow Diagram

REPORTS GENERATED FROM ABOVE SOFTWARE

a. USER’S DETAILS

b. LOGIN DETAILS

c. WEB DETAILS

d. PAGES INFO

17

AdminDistributed

Web Crawler

User

Web Sites

Page 18: Web Crawler

7.2. ER DIAGRAM:

18

1

1 *

*Manages

Computer

User_id

Name

address

Selects

Admin Add Web Site

Passwordd

Name

URLUser_id

Web Name

Date

Pages

Web Name

Location

Page Info

Page ID

Page 19: Web Crawler

7.3. DATA FLOW DIAGRAM

19

Page 20: Web Crawler

7.4. DATABASE DESIGN

Database is a collection of related table and it is the heart of any software because it stores the

most critical part, the data about the system. So proper planning needs to done be done to

ensure the design of an effective database. An effective database design includes:

Normalized Tables

Data Dictionary

Constraints

4.2.1. NORMALIZED TABLES

Web Crawler Project will contain following tables:

SNo. Table Name Description

1 adminStors the information about admin

username and password

2 SitesinfoWill store the information of visited

website

3 PagesinfoWill store the information about crawled

pages

NOTE:

All the tables are normalized up to 3 NF

Tables are stored in movie onlinecourse database

All the tables are created in MySQL Server

MySQL command based utility is used to create tables.

20

Page 21: Web Crawler

CHAPTER 8

SNAPSHOTS

SNAPSHOT 1:

SNAPSHOT 2:

21

Page 22: Web Crawler

SNAPSHOT 3:

SNAPSHOT 4:

22

Page 23: Web Crawler

SNAPSHOT 5:

SNAPSHOT 6:

23

Page 24: Web Crawler

SNAPSHOT 7:

24

Page 25: Web Crawler

CHAPTER 9

CODING

CODING OF CRAWLER.JAVA

package coding;

import javax.swing.JOptionPane;

import java.sql.*;

public class admin_login extends javax.swing.JFrame {

/** Creates new form Manage_Nodes */

public admin_login() {

initComponents();

}

/** This method is called from within the constructor to

* initialize the form.

* WARNING: Do NOT modify this code. The content of this method is

* always regenerated by the Form Editor.

*/

@SuppressWarnings("unchecked")

// <editor-fold defaultstate="collapsed" desc="Generated Code">

private void initComponents() {

jLabel1 = new javax.swing.JLabel();

jLabel2 = new javax.swing.JLabel();

jTextField1 = new javax.swing.JTextField();

jLabel3 = new javax.swing.JLabel();

jPasswordField1 = new javax.swing.JPasswordField();

jButton1 = new javax.swing.JButton();

jButton2 = new javax.swing.JButton();

jLabel4 = new javax.swing.JLabel();

jLabel5 = new javax.swing.JLabel();

jLabel6 = new javax.swing.JLabel();

25

Page 26: Web Crawler

setDefaultCloseOperation(javax.swing.WindowConstants.EXIT_ON_CLOSE);

addWindowListener(new java.awt.event.WindowAdapter() {

public void windowOpened(java.awt.event.WindowEvent evt) {

formWindowOpened(evt);

}

});

jLabel1.setFont(new java.awt.Font("Tahoma", 1, 18));

jLabel1.setText("Admin Login");

jLabel2.setFont(new java.awt.Font("Tahoma", 1, 12));

jLabel2.setText("Username");

jLabel3.setFont(new java.awt.Font("Tahoma", 1, 12));

jLabel3.setText("Password");

jButton1.setText("Login");

jButton1.addActionListener(new java.awt.event.ActionListener() {

public void actionPerformed(java.awt.event.ActionEvent evt) {

jButton1ActionPerformed(evt);

}

});

jButton2.setText("Exit");

jLabel4.setForeground(new java.awt.Color(255, 0, 0));

jLabel4.setText("*");

jLabel5.setForeground(new java.awt.Color(255, 0, 0));

jLabel5.setText("*");

jLabel6.setFont(new java.awt.Font("Tahoma", 1, 24)); // NOI18N

jLabel6.setForeground(new java.awt.Color(255, 0, 51));

jLabel6.setText("Distributed Web Crawler");

javax.swing.GroupLayout layout = new javax.swing.GroupLayout(getContentPane());

getContentPane().setLayout(layout);

layout.setHorizontalGroup(

layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)

.addGroup(javax.swing.GroupLayout.Alignment.TRAILING,

layout.createSequentialGroup()

26

Page 27: Web Crawler

.addContainerGap(134, Short.MAX_VALUE)

.addComponent(jButton1, javax.swing.GroupLayout.PREFERRED_SIZE, 90,

javax.swing.GroupLayout.PREFERRED_SIZE)

.addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.UNRELATED)

.addComponent(jButton2, javax.swing.GroupLayout.PREFERRED_SIZE, 90,

javax.swing.GroupLayout.PREFERRED_SIZE)

.addGap(145, 145, 145))

.addGroup(javax.swing.GroupLayout.Alignment.TRAILING,

layout.createSequentialGroup()

.addContainerGap(185, Short.MAX_VALUE)

.addComponent(jLabel1)

.addGap(189, 189, 189))

.addGroup(layout.createSequentialGroup()

.addGap(79, 79, 79)

.addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADI

NG)

.addGroup(layout.createSequentialGroup()

.addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.L

EADING)

.addComponent(jLabel2)

.addComponent(jLabel3))

.addGap(57, 57, 57)

.addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.L

EADING, false)

.addComponent(jPasswordField1)

.addComponent(jTextField1,

javax.swing.GroupLayout.PREFERRED_SIZE, 164,

javax.swing.GroupLayout.PREFERRED_SIZE))

.addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.UNRELAT

ED)

.addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.L

EADING)

.addComponent(jLabel5, javax.swing.GroupLayout.PREFERRED_SIZE,

18, javax.swing.GroupLayout.PREFERRED_SIZE)

27

Page 28: Web Crawler

.addComponent(jLabel4)))

.addComponent(jLabel6))

.addContainerGap(82, Short.MAX_VALUE))

);

layout.setVerticalGroup(

layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)

.addGroup(javax.swing.GroupLayout.Alignment.TRAILING,

layout.createSequentialGroup()

.addContainerGap(29, Short.MAX_VALUE)

.addComponent(jLabel6)

.addGap(18, 18, 18)

.addComponent(jLabel1)

.addGap(18, 18, 18)

.addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.BASE

LINE)

.addComponent(jLabel2)

.addComponent(jTextField1, javax.swing.GroupLayout.PREFERRED_SIZE,

javax.swing.GroupLayout.DEFAULT_SIZE,

javax.swing.GroupLayout.PREFERRED_SIZE)

.addComponent(jLabel4))

.addGap(18, 18, 18)

.addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.TRAI

LING)

.addComponent(jLabel3)

.addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.BA

SELINE)

.addComponent(jPasswordField1,

javax.swing.GroupLayout.PREFERRED_SIZE, javax.swing.GroupLayout.DEFAULT_SIZE,

javax.swing.GroupLayout.PREFERRED_SIZE)

.addComponent(jLabel5)))

.addGap(28, 28, 28)

.addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.BASE

LINE)

.addComponent(jButton1)

28

Page 29: Web Crawler

.addComponent(jButton2))

.addGap(25, 25, 25))

);

pack();

}// </editor-fold>

private void jButton1ActionPerformed(java.awt.event.ActionEvent evt) {

// TODO add your handling code here:

int flag = 0;

String str = "";

if (jTextField1.getText().equals("")) {

flag = 1;

jLabel3.setVisible(true);

str = "Username";

}

if (jPasswordField1.getText().equals("")) {

flag = 1;

str = str + " Password";

str = str.trim();

}

if (flag == 0) {

try {

DataBaseInfo data = new DataBaseInfo();

PreparedStatement stmt = data.conn.prepareStatement("select * from admin where

username=? and password=?", ResultSet.TYPE_SCROLL_INSENSITIVE,

ResultSet.CONCUR_UPDATABLE);

stmt.setString(1, jTextField1.getText());

stmt.setString(2, jPasswordField1.getText());

ResultSet rs = stmt.executeQuery();

if (rs.next()) {

29

Page 30: Web Crawler

DataBaseInfo.un = rs.getString(1);

DataBaseInfo.pwd = rs.getString(2);

DataBaseInfo.localadd = rs.getString(3);

DataBaseInfo.usedistributed = rs.getString(4);

Manage_Computers obj = new Manage_Computers();

obj.setVisible(true);

this.dispose();

} else {

JOptionPane.showMessageDialog(this, "Invalid username or password !!");

}

} catch (Exception ex) {

JOptionPane.showMessageDialog(this, ex);

}

} else {

str = str + " can't be empty";

JOptionPane.showMessageDialog(this, str);

}

}

private void formWindowOpened(java.awt.event.WindowEvent evt) {

// TODO add your handling code here:

this.setLocationRelativeTo(null);

}

/**

* @param args the command line arguments

*/

public static void main(String args[]) {

/* Set the Nimbus look and feel */

//<editor-fold defaultstate="collapsed" desc=" Look and feel setting code (optional) ">

30

Page 31: Web Crawler

/* If Nimbus (introduced in Java SE 6) is not available, stay with the default look and

feel.

* For details see

http://download.oracle.com/javase/tutorial/uiswing/lookandfeel/plaf.html

*/

try {

for (javax.swing.UIManager.LookAndFeelInfo info :

javax.swing.UIManager.getInstalledLookAndFeels()) {

if ("Nimbus".equals(info.getName())) {

javax.swing.UIManager.setLookAndFeel(info.getClassName());

break;

}

}

} catch (ClassNotFoundException ex) {

java.util.logging.Logger.getLogger(admin_login.class.getName()).log(java.util.logging.Level

.SEVERE, null, ex);

} catch (InstantiationException ex) {

java.util.logging.Logger.getLogger(admin_login.class.getName()).log(java.util.logging.Level

.SEVERE, null, ex);

} catch (IllegalAccessException ex) {

java.util.logging.Logger.getLogger(admin_login.class.getName()).log(java.util.logging.Level

.SEVERE, null, ex);

} catch (javax.swing.UnsupportedLookAndFeelException ex) {

java.util.logging.Logger.getLogger(admin_login.class.getName()).log(java.util.logging.Level

.SEVERE, null, ex);

}

//</editor-fold>

/* Create and display the form */

java.awt.EventQueue.invokeLater(new Runnable() {

31

Page 32: Web Crawler

public void run() {

new admin_login().setVisible(true);

}

});

}

// Variables declaration - do not modify

private javax.swing.JButton jButton1;

private javax.swing.JButton jButton2;

private javax.swing.JLabel jLabel1;

private javax.swing.JLabel jLabel2;

private javax.swing.JLabel jLabel3;

private javax.swing.JLabel jLabel4;

private javax.swing.JLabel jLabel5;

private javax.swing.JLabel jLabel6;

private javax.swing.JPasswordField jPasswordField1;

private javax.swing.JTextField jTextField1;

// End of variables declaration

}

CODING OF CRAWLER.JAVA

/*

* To change this template, choose Tools | Templates

* and open the template in the editor.

*/

/*

* Manage_Nodes.java

*

* Created on Feb 11, 2015, 11:49:21 AM

*/

package coding;

import javax.swing.JOptionPane;

32

Page 33: Web Crawler

import java.sql.*;

/**

*

* @author DSOFT

*/

public class Manage_Computers extends javax.swing.JFrame {

/** Creates new form Manage_Nodes */

public Manage_Computers() {

initComponents();

}

/** This method is called from within the constructor to

* initialize the form.

* WARNING: Do NOT modify this code. The content of this method is

* always regenerated by the Form Editor.

*/

@SuppressWarnings("unchecked")

// <editor-fold defaultstate="collapsed" desc="Generated Code">

private void initComponents() {

jLabel1 = new javax.swing.JLabel();

jLabel6 = new javax.swing.JLabel();

jTabbedPane1 = new javax.swing.JTabbedPane();

jPanel1 = new javax.swing.JPanel();

jLabel3 = new javax.swing.JLabel();

jTextField1 = new javax.swing.JTextField();

jLabel4 = new javax.swing.JLabel();

jTextField2 = new javax.swing.JTextField();

jLabel9 = new javax.swing.JLabel();

jTextField3 = new javax.swing.JTextField();

jButton1 = new javax.swing.JButton();

jCheckBox1 = new javax.swing.JCheckBox();

33

Page 34: Web Crawler

jPanel2 = new javax.swing.JPanel();

jScrollPane1 = new javax.swing.JScrollPane();

jTable1 = new javax.swing.JTable();

jButton3 = new javax.swing.JButton();

jPanel6 = new javax.swing.JPanel();

jLabel10 = new javax.swing.JLabel();

jTextField4 = new javax.swing.JTextField();

jButton4 = new javax.swing.JButton();

jCheckBox2 = new javax.swing.JCheckBox();

jPanel7 = new javax.swing.JPanel();

jLabel11 = new javax.swing.JLabel();

jTextField5 = new javax.swing.JTextField();

jButton5 = new javax.swing.JButton();

jCheckBox3 = new javax.swing.JCheckBox();

jLabel12 = new javax.swing.JLabel();

jTextField6 = new javax.swing.JTextField();

jLabel13 = new javax.swing.JLabel();

jTextField7 = new javax.swing.JTextField();

jLabel14 = new javax.swing.JLabel();

jLabel15 = new javax.swing.JLabel();

jPasswordField1 = new javax.swing.JPasswordField();

jMenuBar1 = new javax.swing.JMenuBar();

jMenu1 = new javax.swing.JMenu();

jSeparator1 = new javax.swing.JPopupMenu.Separator();

jMenuItem1 = new javax.swing.JMenuItem();

jMenu3 = new javax.swing.JMenu();

jSeparator2 = new javax.swing.JPopupMenu.Separator();

jMenuItem7 = new javax.swing.JMenuItem();

jMenu5 = new javax.swing.JMenu();

jSeparator3 = new javax.swing.JPopupMenu.Separator();

jMenuItem14 = new javax.swing.JMenuItem();

jMenu6 = new javax.swing.JMenu();

jSeparator4 = new javax.swing.JPopupMenu.Separator();

jMenu2 = new javax.swing.JMenu();

34

Page 35: Web Crawler

setDefaultCloseOperation(javax.swing.WindowConstants.EXIT_ON_CLOSE);

addWindowListener(new java.awt.event.WindowAdapter() {

public void windowOpened(java.awt.event.WindowEvent evt) {

formWindowOpened(evt);

}

});

jLabel1.setFont(new java.awt.Font("Tahoma", 1, 24));

jLabel1.setForeground(new java.awt.Color(51, 0, 204));

jLabel1.setText("Computer Management");

jLabel6.setFont(new java.awt.Font("Tahoma", 1, 36));

jLabel6.setForeground(new java.awt.Color(255, 0, 51));

jLabel6.setText("Distributed Web Crawler");

jPanel1.setFont(new java.awt.Font("Tahoma", 1, 12));

jLabel3.setFont(new java.awt.Font("Tahoma", 1, 12));

jLabel3.setText("Computer IP");

jTextField1.setFont(new java.awt.Font("Tahoma", 1, 12));

jLabel4.setFont(new java.awt.Font("Tahoma", 1, 12));

jLabel4.setText("Shared Location");

jTextField2.setFont(new java.awt.Font("Tahoma", 1, 12));

jLabel9.setFont(new java.awt.Font("Tahoma", 1, 12));

jLabel9.setText("Maximum Allowed Sites");

jTextField3.setFont(new java.awt.Font("Tahoma", 1, 12));

jButton1.setText("Save Information");

35

Page 36: Web Crawler

jButton1.addActionListener(new java.awt.event.ActionListener() {

public void actionPerformed(java.awt.event.ActionEvent evt) {

jButton1ActionPerformed(evt);

}

});

jCheckBox1.setText("Confirm Save");

javax.swing.GroupLayout jPanel1Layout = new javax.swing.GroupLayout(jPanel1);

jPanel1.setLayout(jPanel1Layout);

jPanel1Layout.setHorizontalGroup(

jPanel1Layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)

.addGroup(jPanel1Layout.createSequentialGroup()

.addGap(67, 67, 67)

.addGroup(jPanel1Layout.createParallelGroup(javax.swing.GroupLayout.Alignme

nt.LEADING)

.addComponent(jLabel9)

.addComponent(jLabel4)

.addComponent(jLabel3)

.addComponent(jButton1, javax.swing.GroupLayout.PREFERRED_SIZE, 188,

javax.swing.GroupLayout.PREFERRED_SIZE))

.addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED)

.addGroup(jPanel1Layout.createParallelGroup(javax.swing.GroupLayout.Alignme

nt.LEADING)

.addGroup(jPanel1Layout.createSequentialGroup()

.addComponent(jCheckBox1)

.addContainerGap())

.addGroup(jPanel1Layout.createParallelGroup(javax.swing.GroupLayout.Align

ment.LEADING)

.addGroup(jPanel1Layout.createSequentialGroup()

.addComponent(jTextField3,

javax.swing.GroupLayout.PREFERRED_SIZE, 201,

javax.swing.GroupLayout.PREFERRED_SIZE)

.addContainerGap())

36

Page 37: Web Crawler

.addGroup(jPanel1Layout.createSequentialGroup()

.addGroup(jPanel1Layout.createParallelGroup(javax.swing.GroupLayout.A

lignment.LEADING)

.addComponent(jTextField1,

javax.swing.GroupLayout.DEFAULT_SIZE, 619, Short.MAX_VALUE)

.addComponent(jTextField2,

javax.swing.GroupLayout.Alignment.TRAILING,

javax.swing.GroupLayout.DEFAULT_SIZE, 619, Short.MAX_VALUE))

.addGap(110, 110, 110)))))

);

jPanel1Layout.setVerticalGroup(

jPanel1Layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)

.addGroup(jPanel1Layout.createSequentialGroup()

.addGap(43, 43, 43)

.addGroup(jPanel1Layout.createParallelGroup(javax.swing.GroupLayout.Alignme

nt.LEADING, false)

.addGroup(jPanel1Layout.createSequentialGroup()

.addGroup(jPanel1Layout.createParallelGroup(javax.swing.GroupLayout.Alig

nment.BASELINE)

.addComponent(jLabel3)

.addComponent(jTextField1,

javax.swing.GroupLayout.PREFERRED_SIZE, javax.swing.GroupLayout.DEFAULT_SIZE,

javax.swing.GroupLayout.PREFERRED_SIZE))

.addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED,

javax.swing.GroupLayout.DEFAULT_SIZE, Short.MAX_VALUE)

.addComponent(jTextField2, javax.swing.GroupLayout.PREFERRED_SIZE,

javax.swing.GroupLayout.DEFAULT_SIZE,

javax.swing.GroupLayout.PREFERRED_SIZE))

.addGroup(jPanel1Layout.createSequentialGroup()

.addGap(45, 45, 45)

.addComponent(jLabel4)))

.addGap(18, 18, 18)

.addGroup(jPanel1Layout.createParallelGroup(javax.swing.GroupLayout.Alignme

nt.BASELINE)

37

Page 38: Web Crawler

.addComponent(jTextField3, javax.swing.GroupLayout.PREFERRED_SIZE,

javax.swing.GroupLayout.DEFAULT_SIZE,

javax.swing.GroupLayout.PREFERRED_SIZE)

.addComponent(jLabel9))

.addGap(29, 29, 29)

.addGroup(jPanel1Layout.createParallelGroup(javax.swing.GroupLayout.Alignme

nt.BASELINE)

.addComponent(jCheckBox1)

.addComponent(jButton1))

.addContainerGap(193, Short.MAX_VALUE))

);

jTabbedPane1.addTab("Add New Computer", jPanel1);

jTable1.setFont(new java.awt.Font("Verdana", 1, 10));

jTable1.setModel(new javax.swing.table.DefaultTableModel(

new Object [][] {

},

new String [] {

}

));

jTable1.setAutoResizeMode(javax.swing.JTable.AUTO_RESIZE_ALL_COLUMNS);

jTable1.setRowHeight(25);

jScrollPane1.setViewportView(jTable1);

jButton3.setText("Show Computers");

jButton3.addActionListener(new java.awt.event.ActionListener() {

public void actionPerformed(java.awt.event.ActionEvent evt) {

jButton3ActionPerformed(evt);

}

});

38

Page 39: Web Crawler

javax.swing.GroupLayout jPanel2Layout = new javax.swing.GroupLayout(jPanel2);

jPanel2.setLayout(jPanel2Layout);

jPanel2Layout.setHorizontalGroup(

jPanel2Layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)

.addGroup(jPanel2Layout.createSequentialGroup()

.addGap(21, 21, 21)

.addGroup(jPanel2Layout.createParallelGroup(javax.swing.GroupLayout.Alignme

nt.LEADING)

.addComponent(jScrollPane1, javax.swing.GroupLayout.PREFERRED_SIZE,

948, javax.swing.GroupLayout.PREFERRED_SIZE)

.addComponent(jButton3, javax.swing.GroupLayout.PREFERRED_SIZE, 151,

javax.swing.GroupLayout.PREFERRED_SIZE))

.addContainerGap(21, Short.MAX_VALUE))

);

jPanel2Layout.setVerticalGroup(

jPanel2Layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)

.addGroup(jPanel2Layout.createSequentialGroup()

.addGap(23, 23, 23)

.addComponent(jButton3)

.addGap(18, 18, 18)

.addComponent(jScrollPane1, javax.swing.GroupLayout.PREFERRED_SIZE, 288,

javax.swing.GroupLayout.PREFERRED_SIZE)

.addContainerGap(35, Short.MAX_VALUE))

);

jTabbedPane1.addTab("View Computers", jPanel2);

jPanel6.setFont(new java.awt.Font("Tahoma", 1, 12));

jLabel10.setFont(new java.awt.Font("Tahoma", 1, 12));

jLabel10.setText("Enter Computer IP");

jTextField4.setFont(new java.awt.Font("Tahoma", 1, 12));

39

Page 40: Web Crawler

jButton4.setText("Delete Computer");

jButton4.addActionListener(new java.awt.event.ActionListener() {

public void actionPerformed(java.awt.event.ActionEvent evt) {

jButton4ActionPerformed(evt);

}

});

jCheckBox2.setText("Confirm Record Delete");

javax.swing.GroupLayout jPanel6Layout = new javax.swing.GroupLayout(jPanel6);

jPanel6.setLayout(jPanel6Layout);

jPanel6Layout.setHorizontalGroup(

jPanel6Layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)

.addGroup(jPanel6Layout.createSequentialGroup()

.addGap(67, 67, 67)

.addGroup(jPanel6Layout.createParallelGroup(javax.swing.GroupLayout.Alignme

nt.LEADING)

.addGroup(jPanel6Layout.createSequentialGroup()

.addComponent(jCheckBox2)

.addContainerGap())

.addGroup(jPanel6Layout.createSequentialGroup()

.addComponent(jLabel10)

.addGap(57, 57, 57)

.addComponent(jTextField4, javax.swing.GroupLayout.PREFERRED_SIZE,

201, javax.swing.GroupLayout.PREFERRED_SIZE)

.addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED,

javax.swing.GroupLayout.DEFAULT_SIZE, Short.MAX_VALUE)

.addComponent(jButton4)

.addGap(452, 452, 452))))

);

jPanel6Layout.setVerticalGroup(

jPanel6Layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)

.addGroup(jPanel6Layout.createSequentialGroup()

.addGap(43, 43, 43)

40

Page 41: Web Crawler

.addGroup(jPanel6Layout.createParallelGroup(javax.swing.GroupLayout.Alignme

nt.BASELINE)

.addComponent(jTextField4, javax.swing.GroupLayout.PREFERRED_SIZE,

javax.swing.GroupLayout.DEFAULT_SIZE,

javax.swing.GroupLayout.PREFERRED_SIZE)

.addComponent(jButton4)

.addComponent(jLabel10))

.addGap(22, 22, 22)

.addComponent(jCheckBox2)

.addContainerGap(276, Short.MAX_VALUE))

);

jTabbedPane1.addTab("Delete Computer", jPanel6);

jPanel7.setFont(new java.awt.Font("Tahoma", 1, 12));

jLabel11.setFont(new java.awt.Font("Tahoma", 1, 12));

jLabel11.setText("Local System Address");

jTextField5.setFont(new java.awt.Font("Tahoma", 1, 12));

jButton5.setText("Save Details");

jButton5.addActionListener(new java.awt.event.ActionListener() {

public void actionPerformed(java.awt.event.ActionEvent evt) {

jButton5ActionPerformed(evt);

}

});

jCheckBox3.setText("Confirm Record Update");

jLabel12.setFont(new java.awt.Font("Tahoma", 1, 12));

jLabel12.setText("Use Distributed");

jTextField6.setFont(new java.awt.Font("Tahoma", 1, 12));

41

Page 42: Web Crawler

jLabel13.setFont(new java.awt.Font("Tahoma", 1, 12));

jLabel13.setText("User ID");

jTextField7.setFont(new java.awt.Font("Tahoma", 1, 12));

jLabel14.setFont(new java.awt.Font("Tahoma", 1, 12));

jLabel14.setText("(Press Y or N)");

jLabel15.setFont(new java.awt.Font("Tahoma", 1, 12));

jLabel15.setText("Password");

javax.swing.GroupLayout jPanel7Layout = new javax.swing.GroupLayout(jPanel7);

jPanel7.setLayout(jPanel7Layout);

jPanel7Layout.setHorizontalGroup(

jPanel7Layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)

.addGroup(jPanel7Layout.createSequentialGroup()

.addGroup(jPanel7Layout.createParallelGroup(javax.swing.GroupLayout.Alignme

nt.LEADING, false)

.addGroup(jPanel7Layout.createSequentialGroup()

.addGap(96, 96, 96)

.addGroup(jPanel7Layout.createParallelGroup(javax.swing.GroupLayout.Alig

nment.LEADING)

.addComponent(jCheckBox3)

.addComponent(jLabel12)

.addComponent(jLabel13)

.addComponent(jLabel15))

.addGap(43, 43, 43))

.addGroup(javax.swing.GroupLayout.Alignment.TRAILING,

jPanel7Layout.createSequentialGroup()

.addContainerGap(javax.swing.GroupLayout.DEFAULT_SIZE,

Short.MAX_VALUE)

.addComponent(jLabel11)

.addGap(63, 63, 63)))

42

Page 43: Web Crawler

.addGroup(jPanel7Layout.createParallelGroup(javax.swing.GroupLayout.Alignme

nt.LEADING)

.addGroup(jPanel7Layout.createSequentialGroup()

.addComponent(jTextField6, javax.swing.GroupLayout.PREFERRED_SIZE,

201, javax.swing.GroupLayout.PREFERRED_SIZE)

.addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.UNRELAT

ED)

.addComponent(jLabel14))

.addComponent(jTextField5, javax.swing.GroupLayout.PREFERRED_SIZE,

201, javax.swing.GroupLayout.PREFERRED_SIZE)

.addComponent(jButton5)

.addGroup(jPanel7Layout.createParallelGroup(javax.swing.GroupLayout.Align

ment.TRAILING, false)

.addComponent(jPasswordField1,

javax.swing.GroupLayout.Alignment.LEADING)

.addComponent(jTextField7,

javax.swing.GroupLayout.Alignment.LEADING,

javax.swing.GroupLayout.DEFAULT_SIZE, 201, Short.MAX_VALUE)))

.addGap(440, 440, 440))

);

jPanel7Layout.setVerticalGroup(

jPanel7Layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)

.addGroup(jPanel7Layout.createSequentialGroup()

.addGap(43, 43, 43)

.addGroup(jPanel7Layout.createParallelGroup(javax.swing.GroupLayout.Alignme

nt.BASELINE)

.addComponent(jTextField5, javax.swing.GroupLayout.PREFERRED_SIZE,

javax.swing.GroupLayout.DEFAULT_SIZE,

javax.swing.GroupLayout.PREFERRED_SIZE)

.addComponent(jLabel11))

.addGap(18, 18, 18)

.addGroup(jPanel7Layout.createParallelGroup(javax.swing.GroupLayout.Alignme

nt.BASELINE)

43

Page 44: Web Crawler

.addComponent(jTextField6, javax.swing.GroupLayout.PREFERRED_SIZE,

javax.swing.GroupLayout.DEFAULT_SIZE,

javax.swing.GroupLayout.PREFERRED_SIZE)

.addComponent(jLabel12)

.addComponent(jLabel14))

.addGap(18, 18, 18)

.addGroup(jPanel7Layout.createParallelGroup(javax.swing.GroupLayout.Alignme

nt.BASELINE)

.addComponent(jLabel13)

.addComponent(jTextField7, javax.swing.GroupLayout.PREFERRED_SIZE,

javax.swing.GroupLayout.DEFAULT_SIZE,

javax.swing.GroupLayout.PREFERRED_SIZE))

.addGap(18, 18, 18)

.addGroup(jPanel7Layout.createParallelGroup(javax.swing.GroupLayout.Alignme

nt.BASELINE)

.addComponent(jLabel15)

.addComponent(jPasswordField1,

javax.swing.GroupLayout.PREFERRED_SIZE, javax.swing.GroupLayout.DEFAULT_SIZE,

javax.swing.GroupLayout.PREFERRED_SIZE))

.addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED, 120,

Short.MAX_VALUE)

.addGroup(jPanel7Layout.createParallelGroup(javax.swing.GroupLayout.Alignme

nt.BASELINE)

.addComponent(jCheckBox3)

.addComponent(jButton5))

.addGap(64, 64, 64))

);

jTabbedPane1.addTab("Change Initial", jPanel7);

jMenu1.setText("Admin Panel");

jMenu1.add(jSeparator1);

jMenuItem1.setText("Show Panel");

44

Page 45: Web Crawler

jMenuItem1.addActionListener(new java.awt.event.ActionListener() {

public void actionPerformed(java.awt.event.ActionEvent evt) {

jMenuItem1ActionPerformed(evt);

}

});

jMenu1.add(jMenuItem1);

jMenuBar1.add(jMenu1);

jMenu3.setText("Web Crawler");

jMenu3.add(jSeparator2);

jMenuItem7.setText("Load Crawler");

jMenuItem7.addActionListener(new java.awt.event.ActionListener() {

public void actionPerformed(java.awt.event.ActionEvent evt) {

jMenuItem7ActionPerformed(evt);

}

});

jMenu3.add(jMenuItem7);

jMenuBar1.add(jMenu3);

jMenu5.setText("Search Websites");

jMenu5.add(jSeparator3);

jMenuItem14.setText("View Search Box");

jMenuItem14.addActionListener(new java.awt.event.ActionListener() {

public void actionPerformed(java.awt.event.ActionEvent evt) {

jMenuItem14ActionPerformed(evt);

}

});

jMenu5.add(jMenuItem14);

jMenuBar1.add(jMenu5);

45

Page 46: Web Crawler

jMenu6.setText("Logout");

jMenu6.addActionListener(new java.awt.event.ActionListener() {

public void actionPerformed(java.awt.event.ActionEvent evt) {

jMenu6ActionPerformed(evt);

}

});

jMenu6.add(jSeparator4);

jMenuBar1.add(jMenu6);

jMenu2.setText("Exit");

jMenu2.addActionListener(new java.awt.event.ActionListener() {

public void actionPerformed(java.awt.event.ActionEvent evt) {

jMenu2ActionPerformed(evt);

}

});

jMenuBar1.add(jMenu2);

setJMenuBar(jMenuBar1);

javax.swing.GroupLayout layout = new javax.swing.GroupLayout(getContentPane());

getContentPane().setLayout(layout);

layout.setHorizontalGroup(

layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)

.addGroup(layout.createSequentialGroup()

.addGap(70, 70, 70)

.addComponent(jLabel6)

.addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED, 244,

Short.MAX_VALUE)

.addComponent(jLabel1)

.addGap(20, 20, 20))

.addGroup(layout.createSequentialGroup()

.addGap(29, 29, 29)

46

Page 47: Web Crawler

.addComponent(jTabbedPane1, javax.swing.GroupLayout.PREFERRED_SIZE,

995, javax.swing.GroupLayout.PREFERRED_SIZE)

.addContainerGap(39, Short.MAX_VALUE))

);

layout.setVerticalGroup(

layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)

.addGroup(layout.createSequentialGroup()

.addContainerGap()

.addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.BASE

LINE)

.addComponent(jLabel6)

.addComponent(jLabel1))

.addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.UNRELATED)

.addComponent(jTabbedPane1, javax.swing.GroupLayout.DEFAULT_SIZE, 415,

Short.MAX_VALUE)

.addContainerGap())

);

pack();

}// </editor-fold>

private void formWindowOpened(java.awt.event.WindowEvent evt) {

// TODO add your handling code here:

this.setLocationRelativeTo(null);

jTextField5.setText(DataBaseInfo.localadd);

jTextField6.setText(DataBaseInfo.usedistributed);

jTextField7.setText(DataBaseInfo.un);

jPasswordField1.setText(DataBaseInfo.pwd);

}

private void jButton1ActionPerformed(java.awt.event.ActionEvent evt) {

// TODO add your handling code here:

47

Page 48: Web Crawler

try {

if (jCheckBox1.isSelected() == true) {

DataBaseInfo db=new DataBaseInfo();

PreparedStatement stmt =db.conn.prepareStatement("insert into nodesinfo

values(?,?,?,0)");

stmt.setString(1, jTextField1.getText());

stmt.setString(2, jTextField2.getText());

stmt.setString(3, jTextField3.getText());

stmt.executeUpdate();

JOptionPane.showMessageDialog(this, "Node successfully added to network");

} else {

JOptionPane.showMessageDialog(this, "Please confirm node entry");

}

} catch (Exception e) {

JOptionPane.showMessageDialog(this, e.getMessage());

}

}

private void jButton3ActionPerformed(java.awt.event.ActionEvent evt) {

// TODO add your handling code here:

try {

DataBaseInfo db=new DataBaseInfo();

PreparedStatement stmt = db.conn.prepareStatement("select * from nodesinfo",

ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_UPDATABLE);

ResultSet rs = stmt.executeQuery();

ResultSetMetaData rdata = rs.getMetaData();

String[] str = {"Node IP Address", "Shared Path", "Maximum Limit", "Used Limit"};

48

Page 49: Web Crawler

int n =DataBaseInfo.returnColumn(rs);

int col = rdata.getColumnCount();

Object[][] data = new Object[n][col + 1];

rs.beforeFirst();

int an = 0;

while (rs.next()) {

for (int j = 1; j <= col; j++) {

data[an][j - 1] = rs.getString(j);

}

an++;

}

jTable1.setModel(new javax.swing.table.DefaultTableModel(

data, str));

} catch (Exception ex) {

JOptionPane.showMessageDialog(this, ex);

}

}

private void jButton4ActionPerformed(java.awt.event.ActionEvent evt) {

// TODO add your handling code here:

try {

DataBaseInfo db=new DataBaseInfo();

PreparedStatement stmt = db.conn.prepareStatement("delete from nodesinfo where

NodeIP=?", ResultSet.TYPE_SCROLL_INSENSITIVE,

ResultSet.CONCUR_UPDATABLE);

stmt.setString(1, jTextField4.getText());

int n=stmt.executeUpdate();

49

Page 50: Web Crawler

if(n==1)

{

JOptionPane.showMessageDialog(this,"Node successfully deleted !!");

jTextField4.setText("");

}

else

{

JOptionPane.showMessageDialog(this,"Node not found !!");

}

} catch (Exception ex) {

JOptionPane.showMessageDialog(this, ex);

}

}

private void jButton5ActionPerformed(java.awt.event.ActionEvent evt) {

// TODO add your handling code here:

try {

if(jCheckBox3.isSelected())

{

DataBaseInfo db=new DataBaseInfo();

PreparedStatement stmt = db.conn.prepareStatement("update admin set username=?,

password=?, localaddress=?, usedistributed=?", ResultSet.TYPE_SCROLL_INSENSITIVE,

ResultSet.CONCUR_UPDATABLE);

stmt.setString(1, jTextField7.getText());

stmt.setString(2, jPasswordField1.getText());

stmt.setString(3, jTextField5.getText());

stmt.setString(4, jTextField6.getText());

int n=stmt.executeUpdate();

if(n==1)

{

JOptionPane.showMessageDialog(this,"Admin information successfully updated !!");

50

Page 51: Web Crawler

DataBaseInfo.un=jTextField7.getText();

DataBaseInfo.pwd=jPasswordField1.getText();

DataBaseInfo.localadd=jTextField5.getText();

DataBaseInfo.usedistributed=jTextField6.getText();

}

else

{

JOptionPane.showMessageDialog(this,"Node not found !!");

}

}

else

{

JOptionPane.showMessageDialog(this,"Please confirm record update !!");

}

} catch (Exception ex) {

JOptionPane.showMessageDialog(this, ex);

}

}

private void jMenuItem1ActionPerformed(java.awt.event.ActionEvent evt) {

// TODO add your handling code here:

Manage_Computers mcom=new Manage_Computers();

mcom.setVisible(true);

this.dispose();

}

private void jMenuItem7ActionPerformed(java.awt.event.ActionEvent evt) {

// TODO add your handling code here:

mywebcrawler mcrawl =new mywebcrawler();

mcrawl.setVisible(true);

this.dispose();

}

51

Page 52: Web Crawler

private void jMenu6ActionPerformed(java.awt.event.ActionEvent evt) {

// TODO add your handling code here:

admin_login al=new admin_login();

al.setVisible(true);

this.disable();

}

private void jMenu2ActionPerformed(java.awt.event.ActionEvent evt) {

// TODO add your handling code here:

this.dispose();

}

private void jMenuItem14ActionPerformed(java.awt.event.ActionEvent evt) {

// TODO add your handling code here:

displaylinks obj=new displaylinks();

obj.setVisible(true);

this.dispose();

}

/**

* @param args the command line arguments

*/

public static void main(String args[]) {

/* Set the Nimbus look and feel */

//<editor-fold defaultstate="collapsed" desc=" Look and feel setting code (optional) ">

/* If Nimbus (introduced in Java SE 6) is not available, stay with the default look and

feel.

* For details see

http://download.oracle.com/javase/tutorial/uiswing/lookandfeel/plaf.html

*/

52

Page 53: Web Crawler

try {

for (javax.swing.UIManager.LookAndFeelInfo info :

javax.swing.UIManager.getInstalledLookAndFeels()) {

if ("Nimbus".equals(info.getName())) {

javax.swing.UIManager.setLookAndFeel(info.getClassName());

break;

}

}

} catch (ClassNotFoundException ex) {

java.util.logging.Logger.getLogger(Manage_Computers.class.getName()).log(java.util.loggin

g.Level.SEVERE, null, ex);

} catch (InstantiationException ex) {

java.util.logging.Logger.getLogger(Manage_Computers.class.getName()).log(java.util.loggin

g.Level.SEVERE, null, ex);

} catch (IllegalAccessException ex) {

java.util.logging.Logger.getLogger(Manage_Computers.class.getName()).log(java.util.loggin

g.Level.SEVERE, null, ex);

} catch (javax.swing.UnsupportedLookAndFeelException ex) {

java.util.logging.Logger.getLogger(Manage_Computers.class.getName()).log(java.util.loggin

g.Level.SEVERE, null, ex);

}

//</editor-fold>

/* Create and display the form */

java.awt.EventQueue.invokeLater(new Runnable() {

public void run() {

new Manage_Computers().setVisible(true);

}

});

53

Page 54: Web Crawler

}

// Variables declaration - do not modify

private javax.swing.JButton jButton1;

private javax.swing.JButton jButton3;

private javax.swing.JButton jButton4;

private javax.swing.JButton jButton5;

private javax.swing.JCheckBox jCheckBox1;

private javax.swing.JCheckBox jCheckBox2;

private javax.swing.JCheckBox jCheckBox3;

private javax.swing.JLabel jLabel1;

private javax.swing.JLabel jLabel10;

private javax.swing.JLabel jLabel11;

private javax.swing.JLabel jLabel12;

private javax.swing.JLabel jLabel13;

private javax.swing.JLabel jLabel14;

private javax.swing.JLabel jLabel15;

private javax.swing.JLabel jLabel3;

private javax.swing.JLabel jLabel4;

private javax.swing.JLabel jLabel6;

private javax.swing.JLabel jLabel9;

private javax.swing.JMenu jMenu1;

private javax.swing.JMenu jMenu2;

private javax.swing.JMenu jMenu3;

private javax.swing.JMenu jMenu5;

private javax.swing.JMenu jMenu6;

private javax.swing.JMenuBar jMenuBar1;

private javax.swing.JMenuItem jMenuItem1;

private javax.swing.JMenuItem jMenuItem14;

private javax.swing.JMenuItem jMenuItem7;

private javax.swing.JPanel jPanel1;

private javax.swing.JPanel jPanel2;

private javax.swing.JPanel jPanel6;

private javax.swing.JPanel jPanel7;

private javax.swing.JPasswordField jPasswordField1;

54

Page 55: Web Crawler

private javax.swing.JScrollPane jScrollPane1;

private javax.swing.JPopupMenu.Separator jSeparator1;

private javax.swing.JPopupMenu.Separator jSeparator2;

private javax.swing.JPopupMenu.Separator jSeparator3;

private javax.swing.JPopupMenu.Separator jSeparator4;

private javax.swing.JTabbedPane jTabbedPane1;

private javax.swing.JTable jTable1;

private javax.swing.JTextField jTextField1;

private javax.swing.JTextField jTextField2;

private javax.swing.JTextField jTextField3;

private javax.swing.JTextField jTextField4;

private javax.swing.JTextField jTextField5;

private javax.swing.JTextField jTextField6;

private javax.swing.JTextField jTextField7;

// End of variables declaration

}

CODING OF CRAWLER.JAVA

package coding;

import java.io.BufferedReader;

import java.io.BufferedWriter;

import java.io.File;

import java.io.FileWriter;

import java.io.IOException;

import java.io.InputStreamReader;

import java.net.URL;

import java.sql.PreparedStatement;

import java.sql.ResultSet;

import java.sql.SQLException;

import java.sql.Statement;

import java.sql.ResultSetMetaData;

55

Page 56: Web Crawler

import javax.swing.JOptionPane;

import org.jsoup.Jsoup;

import org.jsoup.nodes.Document;

import org.jsoup.nodes.Element;

import org.jsoup.select.Elements;

/**

*

* @author DSOFT

*/

public class mywebcrawler extends javax.swing.JFrame {

public static DataBaseInfo db = new DataBaseInfo();

String path = "";

String IP;

/** Creates new form mywebcrawler */

public mywebcrawler() {

initComponents();

}

/** This method is called from within the constructor to

* initialize the form.

* WARNING: Do NOT modify this code. The content of this method is

* always regenerated by the Form Editor.

*/

@SuppressWarnings("unchecked")

// <editor-fold defaultstate="collapsed" desc="Generated Code">

private void initComponents() {

jTextField1 = new javax.swing.JTextField();

jButton1 = new javax.swing.JButton();

jButton2 = new javax.swing.JButton();

jProgressBar1 = new javax.swing.JProgressBar();

56

Page 57: Web Crawler

jLabel6 = new javax.swing.JLabel();

jPanel1 = new javax.swing.JPanel();

jLabel1 = new javax.swing.JLabel();

jComboBox1 = new javax.swing.JComboBox();

jCheckBox1 = new javax.swing.JCheckBox();

jLabel2 = new javax.swing.JLabel();

jMenuBar1 = new javax.swing.JMenuBar();

jMenu1 = new javax.swing.JMenu();

jSeparator1 = new javax.swing.JPopupMenu.Separator();

jMenuItem1 = new javax.swing.JMenuItem();

jMenu3 = new javax.swing.JMenu();

jSeparator2 = new javax.swing.JPopupMenu.Separator();

jMenuItem7 = new javax.swing.JMenuItem();

jMenu5 = new javax.swing.JMenu();

jSeparator3 = new javax.swing.JPopupMenu.Separator();

jMenuItem14 = new javax.swing.JMenuItem();

jMenu6 = new javax.swing.JMenu();

jSeparator4 = new javax.swing.JPopupMenu.Separator();

jMenu2 = new javax.swing.JMenu();

setDefaultCloseOperation(javax.swing.WindowConstants.DISPOSE_ON_CLOSE);

setTitle("Web Crawler");

addWindowListener(new java.awt.event.WindowAdapter() {

public void windowOpened(java.awt.event.WindowEvent evt) {

formWindowOpened(evt);

}

});

jTextField1.setFont(new java.awt.Font("Tahoma", 1, 18));

jTextField1.setHorizontalAlignment(javax.swing.JTextField.CENTER);

jTextField1.setText("http://");

jButton1.setFont(new java.awt.Font("Tahoma", 1, 14));

jButton1.setText("Search & Save");

57

Page 58: Web Crawler

jButton1.addActionListener(new java.awt.event.ActionListener() {

public void actionPerformed(java.awt.event.ActionEvent evt) {

jButton1ActionPerformed(evt);

}

});

jButton2.setFont(new java.awt.Font("Tahoma", 1, 14));

jButton2.setText("Show History");

jButton2.addActionListener(new java.awt.event.ActionListener() {

public void actionPerformed(java.awt.event.ActionEvent evt) {

jButton2ActionPerformed(evt);

}

});

jProgressBar1.setForeground(new java.awt.Color(153, 0, 0));

jLabel6.setFont(new java.awt.Font("Tahoma", 1, 36));

jLabel6.setForeground(new java.awt.Color(255, 0, 51));

jLabel6.setText("Distributed Web Crawler");

jLabel1.setFont(new java.awt.Font("Tahoma", 1, 14));

jLabel1.setText("Select IP Address");

jCheckBox1.setText("Use Load Balancing");

javax.swing.GroupLayout jPanel1Layout = new javax.swing.GroupLayout(jPanel1);

jPanel1.setLayout(jPanel1Layout);

jPanel1Layout.setHorizontalGroup(

jPanel1Layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)

.addGroup(jPanel1Layout.createSequentialGroup()

.addGap(51, 51, 51)

.addGroup(jPanel1Layout.createParallelGroup(javax.swing.GroupLayout.Alignme

nt.LEADING)

.addComponent(jCheckBox1)

58

Page 59: Web Crawler

.addGroup(jPanel1Layout.createSequentialGroup()

.addComponent(jLabel1)

.addGap(72, 72, 72)

.addComponent(jComboBox1,

javax.swing.GroupLayout.PREFERRED_SIZE, 264,

javax.swing.GroupLayout.PREFERRED_SIZE)))

.addContainerGap(22, Short.MAX_VALUE))

);

jPanel1Layout.setVerticalGroup(

jPanel1Layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)

.addGroup(jPanel1Layout.createSequentialGroup()

.addContainerGap()

.addGroup(jPanel1Layout.createParallelGroup(javax.swing.GroupLayout.Alignme

nt.BASELINE)

.addComponent(jLabel1)

.addComponent(jComboBox1, javax.swing.GroupLayout.PREFERRED_SIZE,

javax.swing.GroupLayout.DEFAULT_SIZE,

javax.swing.GroupLayout.PREFERRED_SIZE))

.addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED, 9,

Short.MAX_VALUE)

.addComponent(jCheckBox1))

);

jMenu1.setText("Admin Panel");

jMenu1.add(jSeparator1);

jMenuItem1.setText("Show Panel");

jMenuItem1.addActionListener(new java.awt.event.ActionListener() {

public void actionPerformed(java.awt.event.ActionEvent evt) {

jMenuItem1ActionPerformed(evt);

}

});

jMenu1.add(jMenuItem1);

59

Page 60: Web Crawler

jMenuBar1.add(jMenu1);

jMenu3.setText("Web Crawler");

jMenu3.add(jSeparator2);

jMenuItem7.setText("Load Crawler");

jMenuItem7.addActionListener(new java.awt.event.ActionListener() {

public void actionPerformed(java.awt.event.ActionEvent evt) {

jMenuItem7ActionPerformed(evt);

}

});

jMenu3.add(jMenuItem7);

jMenuBar1.add(jMenu3);

jMenu5.setText("Search Websites");

jMenu5.add(jSeparator3);

jMenuItem14.setText("View Search Box");

jMenuItem14.addActionListener(new java.awt.event.ActionListener() {

public void actionPerformed(java.awt.event.ActionEvent evt) {

jMenuItem14ActionPerformed(evt);

}

});

jMenu5.add(jMenuItem14);

jMenuBar1.add(jMenu5);

jMenu6.setText("Logout");

jMenu6.addActionListener(new java.awt.event.ActionListener() {

public void actionPerformed(java.awt.event.ActionEvent evt) {

jMenu6ActionPerformed(evt);

}

});

60

Page 61: Web Crawler

jMenu6.add(jSeparator4);

jMenuBar1.add(jMenu6);

jMenu2.setText("Exit");

jMenu2.addActionListener(new java.awt.event.ActionListener() {

public void actionPerformed(java.awt.event.ActionEvent evt) {

jMenu2ActionPerformed(evt);

}

});

jMenuBar1.add(jMenu2);

setJMenuBar(jMenuBar1);

javax.swing.GroupLayout layout = new javax.swing.GroupLayout(getContentPane());

getContentPane().setLayout(layout);

layout.setHorizontalGroup(

layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)

.addGroup(layout.createSequentialGroup()

.addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEAD

ING)

.addGroup(layout.createSequentialGroup()

.addGap(230, 230, 230)

.addComponent(jLabel6))

.addGroup(layout.createSequentialGroup()

.addGap(196, 196, 196)

.addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.T

RAILING)

.addComponent(jPanel1, javax.swing.GroupLayout.PREFERRED_SIZE,

javax.swing.GroupLayout.DEFAULT_SIZE,

javax.swing.GroupLayout.PREFERRED_SIZE)

.addComponent(jTextField1,

javax.swing.GroupLayout.PREFERRED_SIZE, 530,

javax.swing.GroupLayout.PREFERRED_SIZE)))

61

Page 62: Web Crawler

.addGroup(layout.createSequentialGroup()

.addGap(269, 269, 269)

.addComponent(jButton1, javax.swing.GroupLayout.PREFERRED_SIZE,

187, javax.swing.GroupLayout.PREFERRED_SIZE)

.addGap(18, 18, 18)

.addComponent(jButton2, javax.swing.GroupLayout.PREFERRED_SIZE,

187, javax.swing.GroupLayout.PREFERRED_SIZE))

.addGroup(layout.createSequentialGroup()

.addGap(130, 130, 130)

.addComponent(jProgressBar1,

javax.swing.GroupLayout.PREFERRED_SIZE, 676,

javax.swing.GroupLayout.PREFERRED_SIZE))

.addGroup(layout.createSequentialGroup()

.addGap(152, 152, 152)

.addComponent(jLabel2, javax.swing.GroupLayout.PREFERRED_SIZE, 603,

javax.swing.GroupLayout.PREFERRED_SIZE)))

.addContainerGap(164, Short.MAX_VALUE))

);

layout.setVerticalGroup(

layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)

.addGroup(layout.createSequentialGroup()

.addGap(46, 46, 46)

.addComponent(jLabel6)

.addGap(18, 18, 18)

.addComponent(jTextField1, javax.swing.GroupLayout.PREFERRED_SIZE, 42,

javax.swing.GroupLayout.PREFERRED_SIZE)

.addGap(27, 27, 27)

.addComponent(jPanel1, javax.swing.GroupLayout.PREFERRED_SIZE,

javax.swing.GroupLayout.DEFAULT_SIZE,

javax.swing.GroupLayout.PREFERRED_SIZE)

.addGap(39, 39, 39)

.addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.BASE

LINE)

62

Page 63: Web Crawler

.addComponent(jButton1, javax.swing.GroupLayout.PREFERRED_SIZE, 42,

javax.swing.GroupLayout.PREFERRED_SIZE)

.addComponent(jButton2, javax.swing.GroupLayout.PREFERRED_SIZE, 42,

javax.swing.GroupLayout.PREFERRED_SIZE))

.addGap(22, 22, 22)

.addComponent(jLabel2)

.addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.UNRELATED)

.addComponent(jProgressBar1, javax.swing.GroupLayout.PREFERRED_SIZE, 23,

javax.swing.GroupLayout.PREFERRED_SIZE)

.addGap(101, 101, 101))

);

pack();

}// </editor-fold>

int count = 0;

private void jButton1ActionPerformed(java.awt.event.ActionEvent evt) {

if (DataBaseInfo.usedistributed.toUpperCase().equals("Y")) {

IP = jComboBox1.getSelectedItem().toString();

path = "\\\\" + jComboBox1.getSelectedItem().toString() + "\\" +

loc[jComboBox1.getSelectedIndex()];

} else {

path = DataBaseInfo.localadd;

}

if (jCheckBox1.isSelected()) {

try {

DataBaseInfo db = new DataBaseInfo();

63

Page 64: Web Crawler

PreparedStatement stmt = db.conn.prepareStatement("select NodeIP,

sharedfolderlocation from nodesinfo where (no_of_sites-availablesites)= (select distinct

max(no_of_sites-availablesites) from nodesinfo)",

ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_UPDATABLE);

ResultSet rs = stmt.executeQuery();

if (rs.next()) {

path = rs.getString(1) + "\\" + rs.getString(2);

IP = rs.getString(1);

}

} catch (Exception ex) {

JOptionPane.showMessageDialog(this, ex);

}

}

abc obj = new abc(); // where abc is the name of thread class

obj.start();

}

String[] loc;

private void formWindowOpened(java.awt.event.WindowEvent evt) {

// TODO add your handling code here:

if (DataBaseInfo.usedistributed.toUpperCase().equals("N")) {

jPanel1.setVisible(false);

} else {

try {

DataBaseInfo db = new DataBaseInfo();

64

Page 65: Web Crawler

PreparedStatement stmt = db.conn.prepareStatement("select

NodeIP,sharedfolderlocation from nodesinfo", ResultSet.TYPE_SCROLL_INSENSITIVE,

ResultSet.CONCUR_UPDATABLE);

ResultSet rs = stmt.executeQuery();

ResultSetMetaData rdata = rs.getMetaData();

int n = DataBaseInfo.returnColumn(rs);

loc = new String[n];

rs.beforeFirst();

int i = 0;

while (rs.next()) {

jComboBox1.addItem(rs.getString(1));

loc[i] = rs.getString(2);

i++;

}

} catch (Exception ex) {

JOptionPane.showMessageDialog(this, ex);

}

}

this.setLocationRelativeTo(null);

}

private void jButton2ActionPerformed(java.awt.event.ActionEvent evt) {

// TODO add your handling code here:

displaylinks obj = new displaylinks();

obj.setVisible(true);

this.dispose();

}

65

Page 66: Web Crawler

private void jMenuItem1ActionPerformed(java.awt.event.ActionEvent evt) {

// TODO add your handling code here:

Manage_Computers mcom = new Manage_Computers();

mcom.setVisible(true);

this.dispose();

}

private void jMenuItem7ActionPerformed(java.awt.event.ActionEvent evt) {

// TODO add your handling code here:

mywebcrawler mcrawl = new mywebcrawler();

mcrawl.setVisible(true);

this.dispose();

}

private void jMenu6ActionPerformed(java.awt.event.ActionEvent evt) {

// TODO add your handling code here:

admin_login al = new admin_login();

al.setVisible(true);

this.disable();

}

private void jMenu2ActionPerformed(java.awt.event.ActionEvent evt) {

// TODO add your handling code here:

this.dispose();

}

private void jMenuItem14ActionPerformed(java.awt.event.ActionEvent evt) {

// TODO add your handling code here:

displaylinks obj=new displaylinks();

obj.setVisible(true);

66

Page 67: Web Crawler

this.dispose();

}

public void processPage(String URL) throws SQLException, Exception {

//check if the given URL is already in database

String sql = "select * from crawledpages where URL = '" + URL + "'";

/*ResultSet rs = db.runSql(sql);

if (rs.next()) {

} else {*/

//store the URL to database to avoid parsing again

if (DataBaseInfo.usedistributed.toUpperCase().equals("Y")) {

IP = jComboBox1.getSelectedItem().toString();

JOptionPane.showMessageDialog(this,jComboBox1.getSelectedIndex());

path = "\\\\" + jComboBox1.getSelectedItem().toString() + "\\" +

loc[jComboBox1.getSelectedIndex()];

} else {

path = DataBaseInfo.localadd;

}

if (jCheckBox1.isSelected()) {

try {

DataBaseInfo db = new DataBaseInfo();

PreparedStatement stmt = db.conn.prepareStatement("select NodeIP,

sharedfolderlocation from nodesinfo where (no_of_sites-availablesites)= (select distinct

67

Page 68: Web Crawler

max(no_of_sites-availablesites) from nodesinfo)",

ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_UPDATABLE);

ResultSet rs1 = stmt.executeQuery();

if (rs1.next()) {

path = "\\\\" + rs1.getString(1) + "\\" + rs1.getString(2);

IP = rs1.getString(1);

}

} catch (Exception ex) {

JOptionPane.showMessageDialog(this, ex );

}

}

path = path + "\\" +

jTextField1.getText().substring(jTextField1.getText().indexOf("//") + 2);

String webpath = jTextField1.getText().substring(jTextField1.getText().indexOf("//")

+ 2);

sql = "INSERT INTO Crawledpages values(?,?,?,?)";

PreparedStatement stmt = db.conn.prepareStatement(sql,

Statement.RETURN_GENERATED_KEYS);

stmt.setString(1, URL);

stmt.setString(2, webpath);

if (DataBaseInfo.usedistributed.toUpperCase().equals("Y")) {

stmt.setString(3, IP);

} else {

stmt.setString(3, "Local");

}

stmt.setString(4, path);

stmt.execute();

Document doc = Jsoup.connect(URL).timeout(0).get();

Elements questions = doc.select("a[href]");

68

Page 69: Web Crawler

for (Element link : questions) {

sql = "INSERT INTO Crawledpages values(?,?,?,?)";

stmt = db.conn.prepareStatement(sql,

Statement.RETURN_GENERATED_KEYS);

stmt.setString(1, link.attr("abs:href"));

stmt.setString(2, webpath);

if (DataBaseInfo.usedistributed.toUpperCase().equals("Y")) {

stmt.setString(3, IP);

} else {

stmt.setString(3, "Local");

}

stmt.setString(4, path);

stmt.execute();

jProgressBar1.setValue(count);

jLabel2.setText(count + " files downloaded and saved !!");

URL oracle = new URL(link.attr("abs:href"));

BufferedReader in = new BufferedReader(

new InputStreamReader(oracle.openStream()));

//path="\\\\10.0.1.42\\

anilsir\\"+jTextField1.getText().substring(jTextField1.getText().indexOf("//")+2);

if (DataBaseInfo.usedistributed.toUpperCase().equals("Y")) {

69

Page 70: Web Crawler

IP = jComboBox1.getSelectedItem().toString();

path = "\\\\" + jComboBox1.getSelectedItem().toString() + "\\" +

loc[jComboBox1.getSelectedIndex()];

} else {

path = DataBaseInfo.localadd;

}

if (jCheckBox1.isSelected()) {

try {

DataBaseInfo db = new DataBaseInfo();

PreparedStatement stmt1 = db.conn.prepareStatement("select NodeIP,

sharedfolderlocation from nodesinfo where (no_of_sites-availablesites)= (select distinct

max(no_of_sites-availablesites) from nodesinfo)",

ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_UPDATABLE);

ResultSet rs1 = stmt1.executeQuery();

if (rs1.next()) {

path = "\\\\" + rs1.getString(1) + "\\" + rs1.getString(2);

IP = rs1.getString(1);

}

} catch (Exception ex) {

JOptionPane.showMessageDialog(this, ex + "Itna difficult kaam");

}

}

path = path + "\\" +

jTextField1.getText().substring(jTextField1.getText().indexOf("//") + 2);

File file = new File(path);

if (file.exists() == false) {

file.mkdirs();

70

Page 71: Web Crawler

}

BufferedWriter writer = new BufferedWriter(new FileWriter(path + "\\" +

link.attr("abs:href").substring(link.attr("abs:href").lastIndexOf('/') + 1)));

String inputLine;

while ((inputLine = in.readLine()) != null) {

try {

writer.write(inputLine);

} catch (IOException e) {

e.printStackTrace();

JOptionPane.showMessageDialog(this, e);

return;

}

}

in.close();

writer.close();

count++;

}

JOptionPane.showMessageDialog(null, count + " record found and saved to

database !!");

}

class abc extends Thread {

@Override

public void run() {

try {

// TODO add your handling code here:

// db.runSql2("TRUNCATE Record;");

processPage(jTextField1.getText());

71

Page 72: Web Crawler

} catch (Exception ex) {

JOptionPane.showMessageDialog(null, ex.getMessage().toString());

} finally {

JOptionPane.showMessageDialog(null, count + " record found and saved to

database !!\n\nFiles saved to " + path);

jProgressBar1.setVisible(false);

}

}

}

/**

* @param args the command line arguments

*/

public static void main(String args[]) {

/* Set the Nimbus look and feel */

//<editor-fold defaultstate="collapsed" desc=" Look and feel setting code (optional) ">

/* If Nimbus (introduced in Java SE 6) is not available, stay with the default look and

feel.

* For details see

http://download.oracle.com/javase/tutorial/uiswing/lookandfeel/plaf.html

*/

try {

for (javax.swing.UIManager.LookAndFeelInfo info :

javax.swing.UIManager.getInstalledLookAndFeels()) {

if ("System".equals(info.getName())) {

javax.swing.UIManager.setLookAndFeel(info.getClassName());

break;

}

}

} catch (ClassNotFoundException ex) {

72

Page 73: Web Crawler

java.util.logging.Logger.getLogger(mywebcrawler.class.getName()).log(java.util.logging.Lev

el.SEVERE, null, ex);

} catch (InstantiationException ex) {

java.util.logging.Logger.getLogger(mywebcrawler.class.getName()).log(java.util.logging.Lev

el.SEVERE, null, ex);

} catch (IllegalAccessException ex) {

java.util.logging.Logger.getLogger(mywebcrawler.class.getName()).log(java.util.logging.Lev

el.SEVERE, null, ex);

} catch (javax.swing.UnsupportedLookAndFeelException ex) {

java.util.logging.Logger.getLogger(mywebcrawler.class.getName()).log(java.util.logging.Lev

el.SEVERE, null, ex);

}

//</editor-fold>

/* Create and display the form */

java.awt.EventQueue.invokeLater(new Runnable() {

public void run() {

new mywebcrawler().setVisible(true);

}

});

}

// Variables declaration - do not modify

private javax.swing.JButton jButton1;

private javax.swing.JButton jButton2;

private javax.swing.JCheckBox jCheckBox1;

private javax.swing.JComboBox jComboBox1;

private javax.swing.JLabel jLabel1;

private javax.swing.JLabel jLabel2;

private javax.swing.JLabel jLabel6;

73

Page 74: Web Crawler

private javax.swing.JMenu jMenu1;

private javax.swing.JMenu jMenu2;

private javax.swing.JMenu jMenu3;

private javax.swing.JMenu jMenu5;

private javax.swing.JMenu jMenu6;

private javax.swing.JMenuBar jMenuBar1;

private javax.swing.JMenuItem jMenuItem1;

private javax.swing.JMenuItem jMenuItem14;

private javax.swing.JMenuItem jMenuItem7;

private javax.swing.JPanel jPanel1;

private javax.swing.JProgressBar jProgressBar1;

private javax.swing.JPopupMenu.Separator jSeparator1;

private javax.swing.JPopupMenu.Separator jSeparator2;

private javax.swing.JPopupMenu.Separator jSeparator3;

private javax.swing.JPopupMenu.Separator jSeparator4;

private javax.swing.JTextField jTextField1;

// End of variables declaration

}

74

Page 75: Web Crawler

CHAPTER 10

TESTING

10.1 TESTING

Testing is a process, which reveals errors in the program. It is the major quality measure

employed during software development. During testing, the program is executed with a set of

conditions known as test cases and the output is evaluated to determine whether the program

is performing as expected.

In order to make sure that the system does not have errors, the different levels of testing

strategies that are applied at differing phases of software development.

75

Page 76: Web Crawler

10.2 LEVELS OF TESTING

The two levels of Testing are

Unit Testing

System Testing

10.2.1. UNIT TESTING:

Unit Testing is done on individual modules as they are completed and become

executable. It is confined only to the designer's requirements.

Each module can be tested using the following two strategies:

Black Box Testing (BBT)

In this strategy some test cases are generated as input conditions that fully execute all

functional requirements for the program. This testing has been uses to find errors in

the following categories:

a) Incorrect or missing functions

b) Interface errors

c) Errors in data structure or external database access

d) Performance errors

e) Initialization and termination errors.

In this testing only the output is checked for correctness. The logical flow of

the data is not checked.

White Box testing (WBT)

In this the test cases are generated on the logic of each module by drawing flow

graphs of that module and logical decisions are tested on all the cases.

It has been used to generate the test cases in the following cases:

a) Guarantee that all independent paths have been executed.

76

Page 77: Web Crawler

b) Execute all logical decisions on their true and false sides.

c) Execute all loops at their boundaries and within their operational bounds.

d) Execute internal data structures to ensure their validity.

10.3. SYSTEM TESTING (ST)

Involves in-house testing of the entire system before delivery to the user. Its aim is to

satisfy the user the system meets all requirements of the client's specifications.

10.4. INTEGRATING TESTING (IT)

Integration testing ensures that software and subsystems work together as a whole. It tests

the interface of all the modules to make sure that the modules behave properly when

integrated together.

10.5. ACCEPTANCE TESTING (AT)

It is a pre-delivery testing in which entire system is tested at client's site on real world data to

find errors.

10.6. VALIDATION

The system has been tested and implemented successfully and thus ensured that all the

requirements as listed in the software requirement specification are completely fulfilled. In

case of erroneous input corresponding error messages are displayed.

COMPILING TEST

It was a good idea to do our stress testing early on, because it gave us time to fix some of the

unexpected deadlocks and stability problems that only occurred when components were

exposed to very high transaction volumes.

EXECUTION TEST

This program was successfully loaded and executed. Because of good programming there

were no execution errors.

77

Page 78: Web Crawler

10.7. TEST CASES

Test Cases:

S.No. Module Test Case ID

Do Expected Result

1. Web Crawler MQT1-001 Enter website name in the

search boxWebsite will be downloaded

2. Web Crawler MQT1-002 Click on search button History should be

displayed.3. Web

Crawler MQT1-003 Click on exit button Form should be closed

4. Node Manager MQT1-004 Add details of a Computer

and click on saveNew Computer should be added

5. Node Manager MQT1-005 Add details of computer and

click on delete.Record should be deleted.

6. Node Manager MQT2-006 Change node information Node information

should be changed.7. Node

Manager MQT2-007 Click on view node info Record should be displayed.

CHAPTER 11

SYSTEM IMPLEMENTATION

11.1. INSTALLATION PROCEDURE OF THE SOFTWARE

To install the software perform the following tasks.

a. First match the minimum requirement for the system. If the condition matches then

install Microsoft Windows XP SP2 or above on the system in which program is going

to be used.

b. After that, it would require to setup JDK 1.7 or above as JVM.

c. Then we require setup NetBeans IDE. Now the software is ready to install the

software.

78

Page 79: Web Crawler

d. Then, Insert the Project CD in the CD-ROM Drive. Open NetBeans, click on Open

Menu and select project.

e. After that build and run the software by selection run from context menu or by

pressing Alt+F6.

f. Select one notepad file with the list of numbers and perform the required sorting

comparision.

11.2 USAGE OF THE SOFTWARE

At first we need the PC and the minimum hardware and software configuration as specified

earlier. After installation any user can make use of the software.

CHAPTER 12

CONCLUSION AND FUTURE SCOPE OF STUDY

The biggest contribution of this project is the concept of distributing crawl tasks based on

disjoint subsets of the URL crawl space. We also presented a scalable, multi-threaded, peerto-

peer distributed architecture for a WebCrawler based on the above concept. Another

interesting contribution of the project is the proposed probabilistic hybrid of Depth-First

Traversal and Breath-First Traversal, although we were unable to study its advantages or

disadvantages during this project. This traversal strategy can be used to achieve the hybrid

of the two traditional strategies without any extra book-keeping and is very easy to

implement. We also implement the complete WebCrawler that demonstrates all of the above

concepts.

FUTURE SCOPE:

Future extension of the project includes implementing the DNS cache in the Crawler Thread

79

Page 80: Web Crawler

and studying the performance of the hybrid traversal strategy on the various cache-hit rates.

A lot of issues need to be dealt with to make this system usable in the real world. The

Crawler needs to conform to robot exclusion protocol. We need to handle partial failure.

Although at present failure of one node will not stop other components, it would be desirable

for other system to take over the task of the node that failed. Also dynamic reconfiguration

and dynamic load-balancing would be desirable.

CHAPTER 13

REFERENCES

1. Allen Heydon and Mark Najork, "Mercator: A Scalable, Extensible Web Crawler", Compaq Systems Research Center, 130 Lytton Ave, Palo Alto, CA 94301, 2001. 

2. Francis Crimmins, "Web Crawler Review", Journal of Information Science, Sep.2001. 

3. Robert C. Miller and Krishna Bharat, "SPHINX: a framework for creating personal,site-specificWeb-crawlers", in Proc. of the Seventh International World Wide Web Conference (WWW7), Brisbane, Australia, April 1998. Printed in Computer Network and ISDN Systems v.30, pp. 119-130, 1998. Brisbane, Australia, April 1998, [4] Berners-Lee and Daniel Connolly, "Hypertext Markup Language. Internetworking draft", Published on the WW W at http://www.w3.org/hypertext, 1, 13 Jul 1993. 

4. Sergey Brin and Lawrence Page, "The anatomy of large scale hyper textual web search engine", Proc. of 7th International World Wide Web Conference, volume 30, Computer Networks and ISDN Systems, pg. 107-117, April 1998. 

5. Alexandros Ntoulas, Junghoo Cho, Christopher Olston "What's New on the Web? The Evolution of the Web from a Search Engine Perspective." In Proc. of the World-wide-Web Conference (WWW), May 2004. 

80

Page 81: Web Crawler

6. Arvind Arasu,Junghoo Cho, Hector Garcia-Molina, Andreas Paepcke. Sriram Raghavan. Computer Science Department, Stanford University."Searching The Web",. 

7. Thomas H. Cormen, Charles E.Leiserson, Ronald L. Rivest, "INTODUCTION TO ALGORITHM", seventh edition, published by Prentice-Hall of India Private Limited. 

8. Ute Abe, Prof. Brandenburg. "String Matching", Sommersemester 2001, pg 1-9. 9. Shi Zhou, Ingemar Cox, Vaclav Petricek, "Characterising Web Site Link Structure",

Dept. of Computer Science, University College London, UK, IEEE 2007. 10. M. Najork, J. Wiener, "Breadth-first crawling yields high quality pages", Compaq

Systems Research Center, 130 Lytton Avenue, Palo Alto, CA 94301, USA, WWW 2001, pg. 114-118. 

81