Hadoop integration with SAP HANA
-
Upload
debajit-banerjee -
Category
Technology
-
view
782 -
download
7
description
Transcript of Hadoop integration with SAP HANA
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 1
SAP HANA Smart Data Access using Hadoop/Hive =================================================================================================
By
Debajit Banerjee
Table of Contents
Introduction about SAP HANA Smart Data Access………………………………………………………………. Page 02
I.HDP 1.3 for Windows Installation Pre-requisite……………………………………………………………….. Page 03
II.HDP 1.3 for Windows (Hortonworks Data Platform) Standalone Installation………………….. Page 13
III.Validation of HDP 1.3 for Windows - Standalone Installation…………………………………………. Page 16
IV.Data Load in Hadoop System : eBook Upload…………………………………………………………………. Page 26
V.Unstructured Data Transformation into Table/View in Hadoop System…………………………… Page 35
VI.ODBC Driver Installation & Configuration on SAP HANA Server………………………………………. Page 40
VII.Smart Data Access (Hadoop Data) in SAP HANA…………………………………………………………….. Page 47
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 2
SAP HANA Smart Data Access
Using the feature of SAP HANA Smart Data Access, it is possible to access remote data, without having to replicate the
data to the SAP HANA database beforehand. The following are supported as sources(till 2013):
Teradata database,
SAP Sybase ASE,
SAP Sybase IQ,
Intel Distribution for Apache Hadoop,
SAP HANA.
SAP HANA handles the data like local tables on the database. Automatic data type conversion makes it possible to map
data types from databases connected via SAP HANA Smart Data Access to SAP HANA data types.
Steps/Procedure :
Hadoop Installation
Data Load in Hadoop system
Activities on Unstructured Data in Hadoop system
ODBC Driver installation & configuration on HANA Server for Hadoop system data access
Smart Data Access in SAP HANA (through SAP HANA Studio), using HADOOP as a remote data source
Assumption – SAP HANA System is already up & running.
Scenario / Lab Setup Details :
1) Hadoop Installation Pre-requisite : HDP 1.3 for Windows(Hortonworks Data Platform) - Standalone
2) Hadoop Installation : HDP 1.3 for Windows(Hortonworks Data Platform) - Standalone – on Dell Laptop, OS Win7
64bit – 8GB)
3) SAP HANA Sever Installation(Lab Server running on VM – 24GB Standalone HANA 1.0 SPS 70) – SLES 11 SP1
4) Validation of Hadoop Installation
5) Data Load in Hadoop system : eBook Upload
6) Unstructured Data transformation into table/views, so that HANA Server can understand Hadoop data.
7) ODBC Driver installation & configuration on HANA Server
8) Smart Data Access in SAP HANA (through SAP HANA Studio), using Hadoop as a remote data source
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 3
I. HDP 1.3 for Windows Installation Pre-requisite
- On HANA Server -Simba : Apache Hive ODBC Driver – Linux 64bit
- On Hadoop System - Microsoft Visual C++ 2010 Redistributable Package (64bit)
- On Hadoop System - Microsoft .NET Framework 4.0
- On Hadoop System - JAVA JDK 1.6/1.7 and PATH, JAVA_HOME environment variables setup
- On Hadoop System - Python 2.7 and PATH environment variable setup
In Linux
In Windows
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 4
MS Visual C++ 2010
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 5
MS .NET Framework 4
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 6
Cancelling it as it gives the option of Repair !!
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 7
Oracle JDK
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 8
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 9
i. Open the Control Panel -> System pane and click on Advanced system
settings.
ii. Click on the Advanced tab.
iii. Click the Environment Variables button.
iv. Under System variables, click New.
v. Enter the Variable Name as JAVA_HOME.
vi. Enter the Variable Value, as the installation path for the Java Development Kit.
For example, if your JDK is installed at C:\Java\jdk1.6.0_31, then you must
provide this path to the Variable Value.
vii. Click OK. viii. Click OK to close the Environment Variables dialog box.
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 10
Python
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 11
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 12
Like Oracle JDK above, C:\Python27 also to be set in PATH variable.
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 13
II. HDP 1.3 for Windows (Hortonworks Data Platform) Standalone Installation
Now accordingly update the C:\hdp-1.3.0.0-GA\clusterproperties.txt as per following:
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 14
In Command Window(Admin Privilege):
msiexec /i "C:\hdp-1.3.0.0-GA\hdp-1.3.0.0.winpkg.msi" /lv "C:\DEBAJIT\HD\hdp13\hdp.log" HDP_LAYOUT="C:\hdp-
1.3.0.0-GA\clusterproperties.txt" HDP_DIR="C:\hdp\hadoop" DESTROY_DATA="Yes"
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 15
There are 3 shortcuts created in desktop area.
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 16
III. Validation of HDP 1.3 for Windows - Standalone Installation
Now we have to start Hadoop.
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 17
Services were not started due to 0 bytes in .xml files(master & regionserver)
Also rest/thrift/thrift2.xml are also of zero bytes.
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 18
1) Navigate to the hbase install directory: C:\hdp\hadoop\hbase-0.94.6.1.3.0.0-0380\bin 2) Open the hbase.cmd in a text editor 3) Look for the line that says: set PATH=%PATH%;%HADOOP_HOME%\bin 4) Delete it or comment it out with a @rem
Now Open a command prompt and navigate to hbase install: C:\hdp\hadoop\hbase-0.94.6.1.3.0.0-0380\bin Rebuild the .xml files: hbase.cmd --service master start > master.xml hbase.cmd --service regionserver start > regionserver.xml hbase.cmd --service rest > rest.xml hbase.cmd --service thrift > thrift.xml hbase.cmd --service thrift2 > thrift2.xml
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 19
Now all the above .xml files having contents.
Stop & Start Hadoop – now it is PERFECT. No more failed services.
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 20
Hadoop Smoketest
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 21
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 22
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 23
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 24
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 25
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 26
IV. Data Load in Hadoop System : eBook Upload
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 27
Now to check whether Hadoop can read the same or not…
It can…perfect !!
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 28
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 29
After refresh
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 30
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 31
From the Namenode server, click on “Browse the filesystem”
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 32
Click on “user”
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 33
Click on .txt file…one can see the book
If one can click on .out file, then one can see the part file
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 34
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 35
V. Unstructured Data Transformation into Table/View in Hadoop System
Now we have to convert those files to be readable table format for HANA. For that we will use HIVE.
Created a table called “debajit_wc” for wordcount part file. But right now, it is empty.
Now loading Data.
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 36
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 37
Configuration change required in hive-site.xml file.
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 38
Just changed from http to thrift – servermode.
And then restart Hadoop.
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 39
Now we can test whether SAP HANA can connect to Hadoop….
Download the license file from email and deployed. Problem solved.
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 40
VI. ODBC Driver Installation & Configuration on SAP HANA Server
Renaming done at WinSCP level….
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 41
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 42
Stopping HANA System
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 43
SIMBA Driver
Changed items are as follows:
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 44
UNIXODBC
We have to upgrade it because of compatibility issue with Simba.
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 45
ODBC.INI - DSN purpose
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 46
Now added odbc information into customer.sh
So, now the connection is working between HANA Server and Hadoop system from OS level.
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 47
VII. Smart Data Access (Hadoop Data) in SAP HANA
SAP HANA Studio
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 48
So, now the connection is working between HANA Server and Hadoop system from SAP HANA Studio.
Creating a schema in HP7
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 49
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 50
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 51
One can do Query and Connection Monitoring when click on “Smart Data Access” under “Provisioning”.
SAP HANA Smart Data Access using Hadoop/Hive
Prepared by Debajit Banerjee Page 52
That’s all.
**** END OF DOCUMENT ****