OpenSAP Hsta1 Week 4 Exercise

24
7/24/2019 OpenSAP Hsta1 Week 4 Exercise http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 1/24  openSAP TEXT ANALYTICS WITH SAP HANA PLATFORM  –  WEEK 4 Version: January 22, 2016 Exercises / Solutions  Anthony Waite / SAP Labs, LLC. Bill Miller / SAP Labs, LLC.

Transcript of OpenSAP Hsta1 Week 4 Exercise

Page 1: OpenSAP Hsta1 Week 4 Exercise

7/24/2019 OpenSAP Hsta1 Week 4 Exercise

http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 1/24

 

openSAP

TEXT ANALYTICS WITH SAP HANA

PLATFORM  –  WEEK 4 

Version: January 22, 2016

Exercises / Solutions Anthony Waite / SAP Labs, LLC.Bill Miller / SAP Labs, LLC.

Page 2: OpenSAP Hsta1 Week 4 Exercise

7/24/2019 OpenSAP Hsta1 Week 4 Exercise

http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 2/24

 

2

ContentsExercise 1  – Solution....................................................................................................................................... 5 

In SAP HANA Studio ......................................................................................................................................... 5 

Exercise 2  – Solution..................................................................................................................................... 10 

In SAP HANA Studio ....................................................................................................................................... 10 

Exercise 3 –

 Solution..................................................................................................................................... 13 

In SAP HANA Studio ....................................................................................................................................... 13 

Exercise 4  – Solution..................................................................................................................................... 16 

In SAP HANA Studio ....................................................................................................................................... 16 

Exercise 5  – Solution..................................................................................................................................... 18 

In SAP HANA Studio ....................................................................................................................................... 18 

Exercise 6  – Solution..................................................................................................................................... 20 

In SAP HANA Studio ....................................................................................................................................... 20 

Exercise 7  – Solution..................................................................................................................................... 22 

In SAP HANA Studio ....................................................................................................................................... 22 

Page 3: OpenSAP Hsta1 Week 4 Exercise

7/24/2019 OpenSAP Hsta1 Week 4 Exercise

http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 3/24

 

3

REMINDER BEFORE YOU START

System Host: HANA IP address  

System Instance Number: 00 

System User ID: SYSTEM 

Password: Master Password you entered for the solution when creating

the instance in the SAP Cloud Appliance Library

Page 4: OpenSAP Hsta1 Week 4 Exercise

7/24/2019 OpenSAP Hsta1 Week 4 Exercise

http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 4/24

 

4

EXERCISE 1  –  CREATE FULLTEXT INDEX WITH TEXT MINING ON AND

GET RELEVANT DOCUMENTS

Objective

In this exercise, you will execute a SQL statement to create a fulltext index for your copy of the workshopdata and specify the TEXT MINING ON parameter. Discover the top-ranked relevant documents based onan input term.

Exercise Description

  Create fulltext index and text mining index from reference document set  Monitor the progress and status of text processing

  Execute the text mining function TM_GET_RELEVANT_DOCUMENTS with the input term “enzyme” 

  Show top-ranked documents from the reference documentation set relevant to “enzyme”  

Page 5: OpenSAP Hsta1 Week 4 Exercise

7/24/2019 OpenSAP Hsta1 Week 4 Exercise

http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 5/24

 

5

EXERCISE 1  –  SOLUTION

In SAP HANA Studio

Steps Screenshot

1) Under the “Repositories”tab, navigate to “(Default) /student00 / solutions /week-4”. Double-click on“exercises.sql”. 

2) If there is “No connectionto database” displayed in the SQL console, click onthe “Choose Connection”icon, which is found to theright of the green circlewith an arrow (Execute)icon.

Page 6: OpenSAP Hsta1 Week 4 Exercise

7/24/2019 OpenSAP Hsta1 Week 4 Exercise

http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 6/24

 

6

3) In the “Choose Connection”dialog, select theappropriate database.

Click the “OK” button. 

4) In the SQL console,highlight the following SQLsyntax:

SET SCHEMA 

OPENSAP_TA_WORKSHOP;

Click on the “Execute” (greencircle with an arrow) icon or hitthe F8 key.

Note: If you close this sessionat any point while working onWeek 4 exercises, you will needto re-execute this command atthe start. 

5) In the SQL console,highlight the following SQLsyntax:

CREATE FULLTEXT INDEX AWARDS_IDX ON 

"student00.data::AWARDS"(A

WARD_ABSTRACT)

FAST PREPROCESS OFF 

Page 7: OpenSAP Hsta1 Week 4 Exercise

7/24/2019 OpenSAP Hsta1 Week 4 Exercise

http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 7/24

 

7

TEXT MINING ON;

Click on the “Execute” (greencircle with an arrow) icon or hitthe F8 key.

6) In the SQL console,highlight the following SQLsyntax:

SELECT * FROM SYS.M_FULLTEXT_QUEUES

WHERE SCHEMA_NAME =

'OPENSAP_TA_WORKSHOP' AND TABLE_NAME =

'student00.data::AWARDS';

Click on the “Execute” (greencircle with an arrow) icon or hit

the F8 key.

7) You can monitor theprogress and status of thetext analysis processing(tokenization, stemming andpart-of-speech tagging),which improves the qualityof text mining. After the jobfinishes, the text miningindex (a.k.a. term-documentmatrix) is created.

Note: Wait until all of thereference documents have beenindexed before executing thefollowing text mining functions.

8) In the SQL console,highlight the following SQLsyntax:

SELECT T.FEDERAL_AWARD_ID_NUMBER,

T.AWARD_TITLE,

T.TOTAL_TERM_COUNT,

T.SCOREFROM 

TM_GET_RELEVANT_DOCUMENTS

(TERM 'enzyme' 

SEARCH

"AWARD_ABSTRACT" FROM 

"OPENSAP_TA_WORKSHOP"."student00.data::AWARDS" 

RETURN TOP 16

FEDERAL_AWARD_ID_NU

MBER, AWARD_TITLE

Page 8: OpenSAP Hsta1 Week 4 Exercise

7/24/2019 OpenSAP Hsta1 Week 4 Exercise

http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 8/24

 

8

) AS T;

Click on the “Execute” (greencircle with an arrow) icon or hitthe F8 key.

9) Notice this text miningfunction shows the top-ranked documents relevantto the input term "enzyme".

Note: You can find conceptsand the usage of the text miningcapabilities in the SAP HANAText Mining Developer Guideposted on the SAP Help Portal.

Page 9: OpenSAP Hsta1 Week 4 Exercise

7/24/2019 OpenSAP Hsta1 Week 4 Exercise

http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 9/24

 

9

EXERCISE 2  –  GET RELATED DOCUMENTS WITH REFERENCE

DOCUMENT

Objective

In this exercise, discover the top-ranked related documents based on an input document found already in thereference collection.

Exercise Description

  View the initial input document from the reference document set about enzymes

  Execute the text mining function TM_GET_RELATED_DOCUMENTS with the input document aboutenzymes

  Show top-ranked documents from the reference documentation set related to the input document about

enzymes

Page 10: OpenSAP Hsta1 Week 4 Exercise

7/24/2019 OpenSAP Hsta1 Week 4 Exercise

http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 10/24

 

10

EXERCISE 2  –  SOLUTION

In SAP HANA Studio

Steps Screenshot

1) In the SQL console,highlight the following SQLsyntax:

SELECT * FROM 

"student00.data::AWARDS" 

WHERE 

FEDERAL_AWARD_ID_NUMBER =

1330760;

Click on the “Execute” (greencircle with an arrow) icon or hit

the F8 key.

2) Notice the document fromthe reference document setis about enzymes.

3) In the SQL console,highlight the following SQLsyntax:

SELECT 

T.FEDERAL_AWARD_ID_NUMBER,

T.AWARD_TITLE,T.TOTAL_TERM_COUNT,T.SCORE

FROM 

TM_GET_RELATED_DOCUMENTS (

DOCUMENT IN FULLTEXT INDEX WHERE 

FEDERAL_AWARD_ID_NUMBER =

1330760SEARCH

"AWARD_ABSTRACT" FROM 

"OPENSAP_TA_WORKSHOP"."stu

dent00.data::AWARDS" 

RETURN TOP 16

FEDERAL_AWARD_ID_NUMBER, AWARD_TITLE

) AS T;

Click on the “Execute” (greencircle with an arrow) icon or hitthe F8 key.

Page 11: OpenSAP Hsta1 Week 4 Exercise

7/24/2019 OpenSAP Hsta1 Week 4 Exercise

http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 11/24

 

11

4) Notice this text miningfunction shows the top-ranked documents relatedto the initial input documentalready found in thereference documentationset. The initial input

document is also returnedwith a score of 1.0, since it’sa perfect match for itself.

Page 12: OpenSAP Hsta1 Week 4 Exercise

7/24/2019 OpenSAP Hsta1 Week 4 Exercise

http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 12/24

 

12

EXERCISE 3  –  GET RELATED DOCUMENTS WITH NEW DOCUMENT

Objective

In this exercise, discover the top-ranked related documents based on a new (previously unseen) inputdocument.

Exercise Description

  Execute the text mining function TM_GET_RELATED_DOCUMENTS with a new input document

  Show top-ranked documents from the reference documentation set related to the new input document

Page 13: OpenSAP Hsta1 Week 4 Exercise

7/24/2019 OpenSAP Hsta1 Week 4 Exercise

http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 13/24

 

13

EXERCISE 3  –  SOLUTION

In SAP HANA Studio

Steps Screenshot

1) In the SQL console,highlight the following SQLsyntax:

SELECT 

T.FEDERAL_AWARD_ID_NUMBER,T.AWARD_TITLE,

T.TOTAL_TERM_COUNT,

T.SCORE

FROM 

TM_GET_RELATED_DOCUMENTS (DOCUMENT ' 

The molecule known ascoenzyme A plays a key

role in cell metabolism byregulating the actions of

nitric oxide. Coenzyme A

sets into motion a process

known as proteinnitrosylation, which

unleashes nitric oxide to

alter the shape andfunction of proteins

within cells to modify

cell behavior. The purpose

of manipulating the

behavior of cells is totailor their actions to

accommodate the ever-changing needs of the

body’s metabolism. 

SEARCH"AWARD_ABSTRACT" FROM 

"OPENSAP_TA_WORKSHOP"."stu

dent00.data::AWARDS" 

RETURN TOP 16

FEDERAL_AWARD_ID_NU

MBER, AWARD_TITLE) AS T;

Click on the “Execute” (greencircle with an arrow) icon or hitthe F8 key.

Page 14: OpenSAP Hsta1 Week 4 Exercise

7/24/2019 OpenSAP Hsta1 Week 4 Exercise

http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 14/24

 

14

2) Notice this shows the top-ranked documents relatedto a new input documentnot found in the referencedocument set.

Page 15: OpenSAP Hsta1 Week 4 Exercise

7/24/2019 OpenSAP Hsta1 Week 4 Exercise

http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 15/24

 

15

EXERCISE 4  –  GET RELEVANT TERMS

Objective

In this exercise, discover the top-ranked relevant terms (key phrases) that describe a document.

Exercise Description

  Execute the text mining function TM_GET_RELEVANT_TERMS with an input document already found in

the reference document set

  Show top-ranked relevant terms from the reference documentation set that describe the input document

Page 16: OpenSAP Hsta1 Week 4 Exercise

7/24/2019 OpenSAP Hsta1 Week 4 Exercise

http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 16/24

 

16

EXERCISE 4  –  SOLUTION

In SAP HANA Studio

Steps Screenshot

1) In the SQL console,highlight the following SQLsyntax:

SELECT T.TERM,

T.NORMALIZED_TERM,T.TERM_TYPE,

T.TERM_FREQUENCY,

T.DOCUMENT_FREQUENCY,T.SCORE

FROM TM_GET_RELEVANT_TERMS (

DOCUMENT IN FULLTEXT INDEX WHERE 

FEDERAL_AWARD_ID_NUMBER =1330760

SEARCH

"AWARD_ABSTRACT" FROM 

"OPENSAP_TA_WORKSHOP"."student00.data::AWARDS" 

RETURN TOP 16

) AS T;

Click on the “Execute” (greencircle with an arrow) icon or hitthe F8 key.

2) Notice this shows the top-ranked relevant terms (keyphrases) that describe theinput document alreadyfound in the referencecollection.

Page 17: OpenSAP Hsta1 Week 4 Exercise

7/24/2019 OpenSAP Hsta1 Week 4 Exercise

http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 17/24

 

17

EXERCISE 5  –  GET RELATED TERMS

Objective

In this exercise, discover the top-ranked related terms based on co-occurrence to an input term.

Exercise Description

  Execute the text mining function TM_GET_RELATED_TERMS with the input term “enzyme” 

  Show top-ranked terms from the reference documentation set related to the input term “enzyme” 

Page 18: OpenSAP Hsta1 Week 4 Exercise

7/24/2019 OpenSAP Hsta1 Week 4 Exercise

http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 18/24

 

18

EXERCISE 5  –  SOLUTION

In SAP HANA Studio

Steps Screenshot

1) In the SQL console,highlight the following SQLsyntax:

SELECT T.TERM,

T.NORMALIZED_TERM,

T.TERM_TYPE,

T.TERM_FREQUENCY,T.DOCUMENT_FREQUENCY,

T.SCORE

FROM 

TM_GET_RELATED_TERMS (TERM 'enzyme' 

SEARCH"AWARD_ABSTRACT" FROM 

"OPENSAP_TA_WORKSHOP"."stu

dent00.data::AWARDS" 

RETURN TOP 16

) AS T;

Click on the “Execute” (greencircle with an arrow) icon or hitthe F8 key.

2) Notice this text miningfunction shows the top-ranked related terms to theinput term "enzyme" alreadyfound in the referencedocumentation set as itreturns with a perfect scoreof "1".

Page 19: OpenSAP Hsta1 Week 4 Exercise

7/24/2019 OpenSAP Hsta1 Week 4 Exercise

http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 19/24

 

19

EXERCISE 6  –  GET SUGGESTED TERMS

Objective

In this exercise, discover the top-ranked terms matching an initial substring.

Exercise Description

  Execute the text mining function TM_GET_SUGGESTED_TERMS with the input substring “enz” 

  Show top-ranked suggested terms from the reference documentation set matching the input substring

“enz”

Page 20: OpenSAP Hsta1 Week 4 Exercise

7/24/2019 OpenSAP Hsta1 Week 4 Exercise

http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 20/24

 

20

EXERCISE 6  –  SOLUTION

In SAP HANA Studio

Steps Screenshot

1) In the SQL console,highlight the following SQLsyntax:

SELECT T.TERM,

T.NORMALIZED_TERM,

T.TERM_TYPE,

T.TERM_FREQUENCY,T.DOCUMENT_FREQUENCY,

T.SCORE

FROM 

TM_GET_SUGGESTED_TERMS (TERM 'enz' 

SEARCH"AWARD_ABSTRACT" FROM 

"OPENSAP_TA_WORKSHOP"."stu

dent00.data::AWARDS" 

RETURN TOP 16

) AS T;

Click on the “Execute” (greencircle with an arrow) icon or hitthe F8 key.

2) Notice this text miningfunction shows the top-ranked suggested terms tothe input substring "enz".

Page 21: OpenSAP Hsta1 Week 4 Exercise

7/24/2019 OpenSAP Hsta1 Week 4 Exercise

http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 21/24

 

21

EXERCISE 7  –  CATEGORIZE

Objective

In this exercise, provide an input document in order to determine the document categories from thereference collection that are most similar to the input document based on the terms used.

Exercise Description

  Execute the text mining function TM_CATEGORIZE_KNN with a new input document

  Show top most-similar categories from the reference documentation set matched to the new input

document

Page 22: OpenSAP Hsta1 Week 4 Exercise

7/24/2019 OpenSAP Hsta1 Week 4 Exercise

http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 22/24

 

22

EXERCISE 7  –  SOLUTION

In SAP HANA Studio

Steps Screenshot

1) In the SQL console,highlight the following SQLsyntax:

SELECT T.RANK,

T.CATEGORY_VALUE,

NEIGHBOR_COUNT, SCORE

FROM TM_CATEGORIZE_KNN (

DOCUMENT ' The molecule

known as coenzyme A playsa key role in cell

metabolism by regulatingthe actions of nitric

oxide. Coenzyme A sets

into motion a processknown as protein

nitrosylation, which

unleashes nitric oxide to

alter the shape andfunction of proteins

within cells to modify

cell behavior. The purpose

of manipulating thebehavior of cells is to

tailor their actions to

accommodate the ever-changing needs of the

body’s metabolism. 

' SEARCH

NEAREST NEIGHBORS 15

"AWARD_ABSTRACT" FROM 

"OPENSAP_TA_WORKSHOP"."student00.data::AWARDS" 

RETURN TOP 16

PROGRAM FROM "OPENSAP_TA_WORKSHOP"."stu

dent00.data::AWARDS" 

) AS T;

Click on the “Execute” (greencircle with an arrow) icon or hitthe F8 key.

Page 23: OpenSAP Hsta1 Week 4 Exercise

7/24/2019 OpenSAP Hsta1 Week 4 Exercise

http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 23/24

 

23

2) Notice the categorizationfunction determines the topcategories from the most-similar referencedocuments and does aweighted comparison byadding and normalizing the

similarities for eachcategory value. 

Page 24: OpenSAP Hsta1 Week 4 Exercise

7/24/2019 OpenSAP Hsta1 Week 4 Exercise

http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 24/24

 

www.sap.com

© 2015 SAP SE or an SAP affiliate company. All rights reserved.No part of this publication may be reproduced or transmitted in any formor for any purpose without the express permission of SAP SE or an SAPaffiliate company.SAP and other SAP products and services mentioned herein as well as theirrespective logos are trademarks or registered trademarks of SAP SE (or anSAP affiliate company) in Germany and other countries. Please seehttp://www.sap.com/corporate-en/legal/copyright/index.epx#trademark foradditional trademark information and notices. Some software productsmarketed by SAP SE and its distributors contain proprietary softwarecomponents of other software vendors.National product specifications may vary.These materials are provided by SAP SE or an SAP affiliate company forinformational purposes only, without representation or warranty of any kind,and SAP SE or its affiliated companies shall not be liable for errors oromissions with respect to the materials. The only warranties for SAP SE orSAP affiliate company products and services are those that are set forth inthe express warranty statements accompanying such products and services,if any. Nothing herein should be construed as constituting an additionalwarranty.In particular, SAP SE or its affiliated companies have no obligation to pursueany course of business outlined in this document or any related presentation,or to develop or release any functionality mentioned therein. This document,or any related presentation, and SAP SE’s or its affiliated companies’

strategy and possible future developments, products, and/or platformdirections and functionality are all subject to change and may be changed bySAP SE or its affiliated companies at any time for any reason without notice.The information in this document is not a commitment, promise, or legalobligation to deliver any material, code, or functionality. All forward-lookingstatements are subject to various risks and uncertainties that could causeactual results to differ materially from expectations. Readers are cautioned