Contents: 1 – Introduction to the subject of web mining and techniques 2 – Overview of...

7
Contents: 1 – Introduction to the subject of web mining and techniques 2 – Overview of research conducted (both theory and practical) 3 – Software applications on which to test web mining techniques 4 – Demonstration (Digital Solutions and Repairs) 5 – Evaluating results (suitability and practicality) Student Name: Colin Hopson Student Number: 0482647 Course Title: MSc Computer Science (Internet Engineering) 7ET023 – MSc Dissertation Research Question : What is the most suitable web mining technique for a specified business and mobile application case study?

Transcript of Contents: 1 – Introduction to the subject of web mining and techniques 2 – Overview of...

Contents:

1 – Introduction to the subject of web mining and techniques2 – Overview of research conducted (both theory and practical)3 – Software applications on which to test web mining techniques4 – Demonstration (Digital Solutions and Repairs)5 – Evaluating results (suitability and practicality)

Student Name: Colin HopsonStudent Number: 0482647Course Title: MSc Computer Science (Internet Engineering)

7ET023 – MSc Dissertation

Research Question: What is the most suitable web mining technique for a specified business and mobile application case study?

7ET023 – MSc Dissertation

1 – Introduction to the subject of web mining and techniques

Sequential research of techniques for an empirical study

Initial research into data mining (databases)

Previous knowledge of web services (RSS, REST, etc.)

Research into theory of web mining

Web usage mining – logs to examine navigation patterns Web structure mining – examine link hierarchy Web content mining – “the discovery of useful information from the Web by

examining the data that is contained in the Web site” (Pendharkar, 2003 pg.243) * Pendharkar, P.C. (2003) Managing data mining technologies in organizations: techniques and applications, Idea Group Pub, Hershey.

Data extraction from HTML (machine learning algorithms)

Wrapper Induction Semi-Automatic Extraction

7ET023 – MSc Dissertation

2 – Overview of research conducted (both theory and practical)

Researching Theory of Data and Web Mining

Empirical research method to acquire knowledge,Research into data mining, web mining, data extraction algorithms, etc.,

Sequential investigation of applicable techniques.

Artefact Design and Development

E-commerce prototype website (Digital Solutions and Repairs),Mobile application (Mobile Shopper).

Practical Research to Implement Techniques

Resolution of web services (Amazon APIs),HTML extraction technique using XML; DOM; Xpath; PHP Arrays,

Consuming Google API with REST; DOM; Xpath; PHP Arrays,Third-Party Software (Newprosoft and Automation Anywhere),

Functionality of XSLT.

7ET023 – MSc Dissertation

3 – Software applications on which to test web mining techniques

7ET023 – MSc Dissertation

4 – Demonstration (Digital Solutions and Repairs)

Web Mining Technique 1Amazon API

(coded class/methods)

Web Mining Technique 2HTML Extraction

(DOMDocument, Xpath and PHP Arrays)

Web Mining Technique 3Google API

(REST, DOMDocument, XPath and PHP Arrays)

Web Mining Technique 4Third-Party Software

(Automation Anywhere and Newprosoft)

Web Mining Technique 5None Implemented, but XSLT investigated

Website Demonstration >>>

7ET023 – MSc Dissertation

5 – Evaluating results (suitability and practicality)

Web Mining Technique 1: Amazon API Requires registration and associate keys,

Product Advertising API has most requirements (plus more),ASINs assist administration system,Top quality delivery and discounts,

Regular updates although lengthy documentation.

Web Mining Technique 2: HTML ExtractionNo cost, but requires programming knowledge,

Bespoke algorithm specific for HTML format,Limited to one online organisation.

Web Mining Technique 3: Google APIRequires registration and associate keys,

Searches products from many online organisations,GoogleId does not assist administration system,

Web service retrieves limited product information,Top security measures, but lengthy documentation.

Web Mining Technique 4: Third-Party SoftwareLimited free trial with subscription costs,

Possible difficulty with integration with administration system

Web Mining Technique 5: XSLT investigatedLimited free trial with subscription costs,

Integration difficulties with administration system

7ET023 – MSc Dissertation

SUMMARY

Questions?

Study of web mining and some of its techniquesEmpirical study, data mining, web services, web content mining, data

extraction algorithms.

Sequential research conducted (theory and practical)Web services (APIs), HTML extraction, Third-Party software, XSLT.

E-commerce prototype website and mobile application‘Digital Solutions and Repairs’ and ‘Mobile Shopper’.

Demonstration of web mining techniquesDSR computer repairs administration system

Evaluation of web mining techniques investigatedComparison between APIs, HTML extraction, third-party software and XSLT.