OpenSAP Hsta1 Week 4 Exercise
-
Upload
min1234212 -
Category
Documents
-
view
254 -
download
1
Transcript of OpenSAP Hsta1 Week 4 Exercise
7/24/2019 OpenSAP Hsta1 Week 4 Exercise
http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 1/24
openSAP
TEXT ANALYTICS WITH SAP HANA
PLATFORM – WEEK 4
Version: January 22, 2016
Exercises / Solutions Anthony Waite / SAP Labs, LLC.Bill Miller / SAP Labs, LLC.
7/24/2019 OpenSAP Hsta1 Week 4 Exercise
http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 2/24
2
ContentsExercise 1 – Solution....................................................................................................................................... 5
In SAP HANA Studio ......................................................................................................................................... 5
Exercise 2 – Solution..................................................................................................................................... 10
In SAP HANA Studio ....................................................................................................................................... 10
Exercise 3 –
Solution..................................................................................................................................... 13
In SAP HANA Studio ....................................................................................................................................... 13
Exercise 4 – Solution..................................................................................................................................... 16
In SAP HANA Studio ....................................................................................................................................... 16
Exercise 5 – Solution..................................................................................................................................... 18
In SAP HANA Studio ....................................................................................................................................... 18
Exercise 6 – Solution..................................................................................................................................... 20
In SAP HANA Studio ....................................................................................................................................... 20
Exercise 7 – Solution..................................................................................................................................... 22
In SAP HANA Studio ....................................................................................................................................... 22
7/24/2019 OpenSAP Hsta1 Week 4 Exercise
http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 3/24
3
REMINDER BEFORE YOU START
System Host: HANA IP address
System Instance Number: 00
System User ID: SYSTEM
Password: Master Password you entered for the solution when creating
the instance in the SAP Cloud Appliance Library
7/24/2019 OpenSAP Hsta1 Week 4 Exercise
http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 4/24
4
EXERCISE 1 – CREATE FULLTEXT INDEX WITH TEXT MINING ON AND
GET RELEVANT DOCUMENTS
Objective
In this exercise, you will execute a SQL statement to create a fulltext index for your copy of the workshopdata and specify the TEXT MINING ON parameter. Discover the top-ranked relevant documents based onan input term.
Exercise Description
Create fulltext index and text mining index from reference document set Monitor the progress and status of text processing
Execute the text mining function TM_GET_RELEVANT_DOCUMENTS with the input term “enzyme”
Show top-ranked documents from the reference documentation set relevant to “enzyme”
7/24/2019 OpenSAP Hsta1 Week 4 Exercise
http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 5/24
5
EXERCISE 1 – SOLUTION
In SAP HANA Studio
Steps Screenshot
1) Under the “Repositories”tab, navigate to “(Default) /student00 / solutions /week-4”. Double-click on“exercises.sql”.
2) If there is “No connectionto database” displayed in the SQL console, click onthe “Choose Connection”icon, which is found to theright of the green circlewith an arrow (Execute)icon.
7/24/2019 OpenSAP Hsta1 Week 4 Exercise
http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 6/24
6
3) In the “Choose Connection”dialog, select theappropriate database.
Click the “OK” button.
4) In the SQL console,highlight the following SQLsyntax:
SET SCHEMA
OPENSAP_TA_WORKSHOP;
Click on the “Execute” (greencircle with an arrow) icon or hitthe F8 key.
Note: If you close this sessionat any point while working onWeek 4 exercises, you will needto re-execute this command atthe start.
5) In the SQL console,highlight the following SQLsyntax:
CREATE FULLTEXT INDEX AWARDS_IDX ON
"student00.data::AWARDS"(A
WARD_ABSTRACT)
FAST PREPROCESS OFF
7/24/2019 OpenSAP Hsta1 Week 4 Exercise
http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 7/24
7
TEXT MINING ON;
Click on the “Execute” (greencircle with an arrow) icon or hitthe F8 key.
6) In the SQL console,highlight the following SQLsyntax:
SELECT * FROM SYS.M_FULLTEXT_QUEUES
WHERE SCHEMA_NAME =
'OPENSAP_TA_WORKSHOP' AND TABLE_NAME =
'student00.data::AWARDS';
Click on the “Execute” (greencircle with an arrow) icon or hit
the F8 key.
7) You can monitor theprogress and status of thetext analysis processing(tokenization, stemming andpart-of-speech tagging),which improves the qualityof text mining. After the jobfinishes, the text miningindex (a.k.a. term-documentmatrix) is created.
Note: Wait until all of thereference documents have beenindexed before executing thefollowing text mining functions.
8) In the SQL console,highlight the following SQLsyntax:
SELECT T.FEDERAL_AWARD_ID_NUMBER,
T.AWARD_TITLE,
T.TOTAL_TERM_COUNT,
T.SCOREFROM
TM_GET_RELEVANT_DOCUMENTS
(TERM 'enzyme'
SEARCH
"AWARD_ABSTRACT" FROM
"OPENSAP_TA_WORKSHOP"."student00.data::AWARDS"
RETURN TOP 16
FEDERAL_AWARD_ID_NU
MBER, AWARD_TITLE
7/24/2019 OpenSAP Hsta1 Week 4 Exercise
http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 8/24
8
) AS T;
Click on the “Execute” (greencircle with an arrow) icon or hitthe F8 key.
9) Notice this text miningfunction shows the top-ranked documents relevantto the input term "enzyme".
Note: You can find conceptsand the usage of the text miningcapabilities in the SAP HANAText Mining Developer Guideposted on the SAP Help Portal.
7/24/2019 OpenSAP Hsta1 Week 4 Exercise
http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 9/24
9
EXERCISE 2 – GET RELATED DOCUMENTS WITH REFERENCE
DOCUMENT
Objective
In this exercise, discover the top-ranked related documents based on an input document found already in thereference collection.
Exercise Description
View the initial input document from the reference document set about enzymes
Execute the text mining function TM_GET_RELATED_DOCUMENTS with the input document aboutenzymes
Show top-ranked documents from the reference documentation set related to the input document about
enzymes
7/24/2019 OpenSAP Hsta1 Week 4 Exercise
http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 10/24
10
EXERCISE 2 – SOLUTION
In SAP HANA Studio
Steps Screenshot
1) In the SQL console,highlight the following SQLsyntax:
SELECT * FROM
"student00.data::AWARDS"
WHERE
FEDERAL_AWARD_ID_NUMBER =
1330760;
Click on the “Execute” (greencircle with an arrow) icon or hit
the F8 key.
2) Notice the document fromthe reference document setis about enzymes.
3) In the SQL console,highlight the following SQLsyntax:
SELECT
T.FEDERAL_AWARD_ID_NUMBER,
T.AWARD_TITLE,T.TOTAL_TERM_COUNT,T.SCORE
FROM
TM_GET_RELATED_DOCUMENTS (
DOCUMENT IN FULLTEXT INDEX WHERE
FEDERAL_AWARD_ID_NUMBER =
1330760SEARCH
"AWARD_ABSTRACT" FROM
"OPENSAP_TA_WORKSHOP"."stu
dent00.data::AWARDS"
RETURN TOP 16
FEDERAL_AWARD_ID_NUMBER, AWARD_TITLE
) AS T;
Click on the “Execute” (greencircle with an arrow) icon or hitthe F8 key.
7/24/2019 OpenSAP Hsta1 Week 4 Exercise
http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 11/24
11
4) Notice this text miningfunction shows the top-ranked documents relatedto the initial input documentalready found in thereference documentationset. The initial input
document is also returnedwith a score of 1.0, since it’sa perfect match for itself.
7/24/2019 OpenSAP Hsta1 Week 4 Exercise
http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 12/24
12
EXERCISE 3 – GET RELATED DOCUMENTS WITH NEW DOCUMENT
Objective
In this exercise, discover the top-ranked related documents based on a new (previously unseen) inputdocument.
Exercise Description
Execute the text mining function TM_GET_RELATED_DOCUMENTS with a new input document
Show top-ranked documents from the reference documentation set related to the new input document
7/24/2019 OpenSAP Hsta1 Week 4 Exercise
http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 13/24
13
EXERCISE 3 – SOLUTION
In SAP HANA Studio
Steps Screenshot
1) In the SQL console,highlight the following SQLsyntax:
SELECT
T.FEDERAL_AWARD_ID_NUMBER,T.AWARD_TITLE,
T.TOTAL_TERM_COUNT,
T.SCORE
FROM
TM_GET_RELATED_DOCUMENTS (DOCUMENT '
The molecule known ascoenzyme A plays a key
role in cell metabolism byregulating the actions of
nitric oxide. Coenzyme A
sets into motion a process
known as proteinnitrosylation, which
unleashes nitric oxide to
alter the shape andfunction of proteins
within cells to modify
cell behavior. The purpose
of manipulating the
behavior of cells is totailor their actions to
accommodate the ever-changing needs of the
body’s metabolism.
'
SEARCH"AWARD_ABSTRACT" FROM
"OPENSAP_TA_WORKSHOP"."stu
dent00.data::AWARDS"
RETURN TOP 16
FEDERAL_AWARD_ID_NU
MBER, AWARD_TITLE) AS T;
Click on the “Execute” (greencircle with an arrow) icon or hitthe F8 key.
7/24/2019 OpenSAP Hsta1 Week 4 Exercise
http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 14/24
14
2) Notice this shows the top-ranked documents relatedto a new input documentnot found in the referencedocument set.
7/24/2019 OpenSAP Hsta1 Week 4 Exercise
http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 15/24
15
EXERCISE 4 – GET RELEVANT TERMS
Objective
In this exercise, discover the top-ranked relevant terms (key phrases) that describe a document.
Exercise Description
Execute the text mining function TM_GET_RELEVANT_TERMS with an input document already found in
the reference document set
Show top-ranked relevant terms from the reference documentation set that describe the input document
7/24/2019 OpenSAP Hsta1 Week 4 Exercise
http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 16/24
16
EXERCISE 4 – SOLUTION
In SAP HANA Studio
Steps Screenshot
1) In the SQL console,highlight the following SQLsyntax:
SELECT T.TERM,
T.NORMALIZED_TERM,T.TERM_TYPE,
T.TERM_FREQUENCY,
T.DOCUMENT_FREQUENCY,T.SCORE
FROM TM_GET_RELEVANT_TERMS (
DOCUMENT IN FULLTEXT INDEX WHERE
FEDERAL_AWARD_ID_NUMBER =1330760
SEARCH
"AWARD_ABSTRACT" FROM
"OPENSAP_TA_WORKSHOP"."student00.data::AWARDS"
RETURN TOP 16
) AS T;
Click on the “Execute” (greencircle with an arrow) icon or hitthe F8 key.
2) Notice this shows the top-ranked relevant terms (keyphrases) that describe theinput document alreadyfound in the referencecollection.
7/24/2019 OpenSAP Hsta1 Week 4 Exercise
http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 17/24
17
EXERCISE 5 – GET RELATED TERMS
Objective
In this exercise, discover the top-ranked related terms based on co-occurrence to an input term.
Exercise Description
Execute the text mining function TM_GET_RELATED_TERMS with the input term “enzyme”
Show top-ranked terms from the reference documentation set related to the input term “enzyme”
7/24/2019 OpenSAP Hsta1 Week 4 Exercise
http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 18/24
18
EXERCISE 5 – SOLUTION
In SAP HANA Studio
Steps Screenshot
1) In the SQL console,highlight the following SQLsyntax:
SELECT T.TERM,
T.NORMALIZED_TERM,
T.TERM_TYPE,
T.TERM_FREQUENCY,T.DOCUMENT_FREQUENCY,
T.SCORE
FROM
TM_GET_RELATED_TERMS (TERM 'enzyme'
SEARCH"AWARD_ABSTRACT" FROM
"OPENSAP_TA_WORKSHOP"."stu
dent00.data::AWARDS"
RETURN TOP 16
) AS T;
Click on the “Execute” (greencircle with an arrow) icon or hitthe F8 key.
2) Notice this text miningfunction shows the top-ranked related terms to theinput term "enzyme" alreadyfound in the referencedocumentation set as itreturns with a perfect scoreof "1".
7/24/2019 OpenSAP Hsta1 Week 4 Exercise
http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 19/24
19
EXERCISE 6 – GET SUGGESTED TERMS
Objective
In this exercise, discover the top-ranked terms matching an initial substring.
Exercise Description
Execute the text mining function TM_GET_SUGGESTED_TERMS with the input substring “enz”
Show top-ranked suggested terms from the reference documentation set matching the input substring
“enz”
7/24/2019 OpenSAP Hsta1 Week 4 Exercise
http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 20/24
20
EXERCISE 6 – SOLUTION
In SAP HANA Studio
Steps Screenshot
1) In the SQL console,highlight the following SQLsyntax:
SELECT T.TERM,
T.NORMALIZED_TERM,
T.TERM_TYPE,
T.TERM_FREQUENCY,T.DOCUMENT_FREQUENCY,
T.SCORE
FROM
TM_GET_SUGGESTED_TERMS (TERM 'enz'
SEARCH"AWARD_ABSTRACT" FROM
"OPENSAP_TA_WORKSHOP"."stu
dent00.data::AWARDS"
RETURN TOP 16
) AS T;
Click on the “Execute” (greencircle with an arrow) icon or hitthe F8 key.
2) Notice this text miningfunction shows the top-ranked suggested terms tothe input substring "enz".
7/24/2019 OpenSAP Hsta1 Week 4 Exercise
http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 21/24
21
EXERCISE 7 – CATEGORIZE
Objective
In this exercise, provide an input document in order to determine the document categories from thereference collection that are most similar to the input document based on the terms used.
Exercise Description
Execute the text mining function TM_CATEGORIZE_KNN with a new input document
Show top most-similar categories from the reference documentation set matched to the new input
document
7/24/2019 OpenSAP Hsta1 Week 4 Exercise
http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 22/24
22
EXERCISE 7 – SOLUTION
In SAP HANA Studio
Steps Screenshot
1) In the SQL console,highlight the following SQLsyntax:
SELECT T.RANK,
T.CATEGORY_VALUE,
NEIGHBOR_COUNT, SCORE
FROM TM_CATEGORIZE_KNN (
DOCUMENT ' The molecule
known as coenzyme A playsa key role in cell
metabolism by regulatingthe actions of nitric
oxide. Coenzyme A sets
into motion a processknown as protein
nitrosylation, which
unleashes nitric oxide to
alter the shape andfunction of proteins
within cells to modify
cell behavior. The purpose
of manipulating thebehavior of cells is to
tailor their actions to
accommodate the ever-changing needs of the
body’s metabolism.
' SEARCH
NEAREST NEIGHBORS 15
"AWARD_ABSTRACT" FROM
"OPENSAP_TA_WORKSHOP"."student00.data::AWARDS"
RETURN TOP 16
PROGRAM FROM "OPENSAP_TA_WORKSHOP"."stu
dent00.data::AWARDS"
) AS T;
Click on the “Execute” (greencircle with an arrow) icon or hitthe F8 key.
7/24/2019 OpenSAP Hsta1 Week 4 Exercise
http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 23/24
23
2) Notice the categorizationfunction determines the topcategories from the most-similar referencedocuments and does aweighted comparison byadding and normalizing the
similarities for eachcategory value.
7/24/2019 OpenSAP Hsta1 Week 4 Exercise
http://slidepdf.com/reader/full/opensap-hsta1-week-4-exercise 24/24
www.sap.com
© 2015 SAP SE or an SAP affiliate company. All rights reserved.No part of this publication may be reproduced or transmitted in any formor for any purpose without the express permission of SAP SE or an SAPaffiliate company.SAP and other SAP products and services mentioned herein as well as theirrespective logos are trademarks or registered trademarks of SAP SE (or anSAP affiliate company) in Germany and other countries. Please seehttp://www.sap.com/corporate-en/legal/copyright/index.epx#trademark foradditional trademark information and notices. Some software productsmarketed by SAP SE and its distributors contain proprietary softwarecomponents of other software vendors.National product specifications may vary.These materials are provided by SAP SE or an SAP affiliate company forinformational purposes only, without representation or warranty of any kind,and SAP SE or its affiliated companies shall not be liable for errors oromissions with respect to the materials. The only warranties for SAP SE orSAP affiliate company products and services are those that are set forth inthe express warranty statements accompanying such products and services,if any. Nothing herein should be construed as constituting an additionalwarranty.In particular, SAP SE or its affiliated companies have no obligation to pursueany course of business outlined in this document or any related presentation,or to develop or release any functionality mentioned therein. This document,or any related presentation, and SAP SE’s or its affiliated companies’
strategy and possible future developments, products, and/or platformdirections and functionality are all subject to change and may be changed bySAP SE or its affiliated companies at any time for any reason without notice.The information in this document is not a commitment, promise, or legalobligation to deliver any material, code, or functionality. All forward-lookingstatements are subject to various risks and uncertainties that could causeactual results to differ materially from expectations. Readers are cautioned