In 961 BigDataTrialSandboxforHortonworksInstallandConfig

17
Informatica Big Data Trial Sandbox for Hortonworks Quick Start © 2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. All other company and product names may be trade names or trademarks of their respective owners and/or copyrighted materials of such owners.

description

i

Transcript of In 961 BigDataTrialSandboxforHortonworksInstallandConfig

  • Informatica Big Data Trial Sandbox for

    Hortonworks Quick Start

    2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. All other company and product names may be trade names or trademarks of their respective owners and/or copyrighted materials of such owners.

  • AbstractThis document describes how to use Informatica Big Data Edition Sandbox for Hortonworks to run sample mappings based on common big data uses cases. After you understand the sample big data use cases, you can create and run your own big data mappings.

    Supported Versions Informatica 9.6.1 HotFix 1

    Table of ContentsInstallation and Configuration Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Step 1. Download the Software. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    Download and Install VMWare Player. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Register at Informatica Marketplace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Download the Big Data Trial Sandbox for Hortonworks Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    Step 2. Start the Big Data Trial Sandbox for Hortonworks Virtual Machine. . . . . . . . . . . . . . . . . . . . . . . . 4Step 3. Configure and Install the Big Data Trial Sandbox for Hortonworks Client. . . . . . . . . . . . . . . . . . . . 4

    Configure the Domain Properties on the Windows Machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Configure a Static IP Address on the Windows Machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Install the Big Data Trial Sandbox for Hortonworks Client. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    Step 4. Access the Big Data Trial Sandbox for Hortonworks Sandbox. . . . . . . . . . . . . . . . . . . . . . . . . . . 6Apache Ambari. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Informatica Administrator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    Big Data Trial Sandbox for Hortonworks Samples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Running Common Tutorial Mappings on Hadoop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Performing Data Discovery on Hadoop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Performing Data Warehouse Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Processing Complex Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    Reading and Parsing Complex Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Writing to Complex Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    Working with NoSQL Databases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14HBase. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    2

  • Troubleshooting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    Installation and Configuration OverviewBig Data Trial Sandbox for Hortonworks consists of a virtual machine component and a client component. Use Big Data Trial Sandbox for Hortonworks to run Informatica mappings on a Hortonworks virtual machine configured for the Hadoop environment.The Big Data Trial Sandbox for Hortonworks virtual machine has the following components: 9.6.1 Informatica services Hortonworks 2.1.3 Sample data Sample mappings for common big data use casesNote: The Informatica Big Data Trial Sandbox for Hortonworks installation and configuration document is available on the desktop of the virtual machine.The Big Data Trial Sandbox for Hortonworks client installs the libraries and binaries required for the Informatica Developer (Developer tool) client.

    Step 1. Download the SoftwareBefore you download the Big Data Trial Sandbox for Hortonworks software, you must download and install VMware Player. Then, register at Informatica Marketplace and download the Big Data Trial Sandbox for Hortonworks virtual machine and client.

    Download and Install VMWare PlayerTo play the Big Data Trial Sandbox for Hortonworks virtual machine download and install VMware Player.Download VMware Player from the following VMware website: https://my.vmware.com/web/vmware/free#desktop_end_user_computing/vmware_player/6_0The software available for download at the referenced links belongs to a third party or third parties, not Informatica Corporation. The download links are subject to the possibility of errors, omissions or change. Informatica assumes no responsibility for such links and/or such software, disclaims all warranties, either express or implied, including but not limited to, implied warranties of merchantability, fitness for a particular purpose, title and non-infringement, and disclaims all liability relating thereto.You must have at least 10 GB of RAM and 30 GB of disk space available on the machine on which you download and install VMWare Player.

    Register at Informatica MarketplaceRegister at Informatica Marketplace. Then, create an account to log in to Informatica Marketplace to download the Big Data Trial Sandbox for Hortonworks client and server software.You can access Informatica Marketplace here: https://marketplace.informatica.com/bdehortonworks.When you register with Informatica Marketplace, you get a free 60-day trial to use Big Data Trial Sandbox for Hortonworks.

    3

  • Download the Big Data Trial Sandbox for Hortonworks FilesAfter you log in to Informatica Marketplace, download the Big Data Trial Sandbox for Hortonworks virtual machine and client.Download the following files:BigDataTrialSandboxForHortonworks.ova

    Includes the Big Data Trial Sandbox for Hortonworks virtual machine. Download the file to the machine on which VMware Player is installed.

    961_BigDataTrial_Client_Installer_win32_x86.zipIncludes the compressed Big Data Trial Sandbox for Hortonworks client. Download the file to an Informatica client installation directory on a Microsoft Windows-32 machine.Extract files in the client zip file to a directory on your local machine. For example, extract the files to the C:/ drive on your machine.

    Step 2. Start the Big Data Trial Sandbox for Hortonworks Virtual MachineOpen the Big Data Trial Sandbox for Hortonworks virtual machine in VMware Player.1. Go to the directory where you downloaded BigDataTrialSandboxForHortonworks.ova and double-click the

    file. VMware Player opens and starts the BigDataTrialSandboxForHortonworks virtual machine.

    2. Optionally, in VMware Player click Browse > Import to extract the contents of the virtual machine to the selected location and start the virtual machine. Then, click Play virtual machine.

    You are logged in to the virtual machine. The Informatica services and Hadoop services start automatically.

    Step 3. Configure and Install the Big Data Trial Sandbox for Hortonworks ClientTo communicate with the virtual machine before you run the client, you must configure the domain properties for the Big Data Trial Sandbox for Hortonworks client installation.Optionally, to avoid updating the IP address of the virtual machine each time it changes, you can configure a static IP address for the virtual machine.Then, you can run the silent installer to install the Big Data Trial Sandbox for Hortonworks client.

    Configure the Domain Properties on the Windows MachineConfigure the IP address and host name of the virtual machine for the Developer tool.1. Click Applications > System Tools > Terminal to open the terminal to run commands. 2. Run the ifconfig command to find the IP address of the virtual machine.

    The ifconfig command returns all interfaces on the virtual machine. Select the eth interface to get values for IP address.

    4

  • The following image shows the ifconfig command with the return value for inet addr highlighted with a red arrow:

    3. Add the IP address and the default hostname hdp-bde-demo to the hosts file on the Windows machine on which you install the Developer tool. The hosts file can be located in the following location: C:\Windows\System32\drivers\etc\hosts. Add the following line to the hosts file: . For example, add the following line:

    192.168.159.159 hdp-bde-demo

    Configure a Static IP Address on the Windows MachineOptionally, to avoid updating the IP address in the hosts file each time the IP address of the virtual machine changes, configure a static IP address for the virtual machine.1. Click Applications > System Tools > Terminal to open the terminal to run commands. 2. Run the ifconfig command to find the IP address and hardware ethernet address of the virtual machine.

    The ifconfig command returns all interfaces on the virtual machine. Select the eth interface to get values for the hardware ethernet address.The following image shows the ifconfig command with the return values for inet addr and HWaddr outlined with red boxes:

    3. Edit vmnetdhcp.conf to add the values for host name, IP address, and hardware ethernet address. vmnetdhcp.conf is located in the following directory: C:\ProgramData\VMwareAdd the following entry before the #END tag at the end of the file:

    host {hardware ethernet ;fixed-address ;}

    The following sample code shows how to set a static IP address:host hdp-bde-demo {hardware ethernet 00:0C:29:10:F9:4C;fixed-address 192.168.159.159;}

    4. Add the IP address and the default hostname hdp-bde-demo to the hosts file on the Windows machine on which you install the Developer tool.

    5

  • The hosts file can be located in the following location: C:\Windows\System32\drivers\etc\hosts. Add the following line to the hosts file: . For example, add the following line:

    192.168.159.159 hdp-bde-demo5. Shut down the virtual machine. 6. Restart the host machine and virtual machine.

    Install the Big Data Trial Sandbox for Hortonworks ClientTo install the client libraries and binaries perform the following steps:1. Go to the directory that contains the client installation files.2. Click silentInstall.bat to run the silent installer.The silent installer runs in the background. The process can take several minutes.The command window displays a message that indicates that the installation is complete.You can find the Informatica_Version_Client_InstallLog.log file in the following directory: C:\Informatica\9.6.1_BDE_Trial\

    After the installation process is complete, you can launch the Big Data Trial Sandbox for Hortonworks Client.

    Step 4. Access the Big Data Trial Sandbox for Hortonworks SandboxYou can log in to Apache Ambari to install, configure, and manage Hadoop clustersYou can log in to Informatica Administrator (the Administrator tool) to monitor Informatica services and the status of mapping jobs.You can log in to the Developer tool to run the sample mappings based on common big data use cases. You can create your own mappings and run the mappings from the Developer tool.For more information on how to run mappings in the Developer tool, see the Informatica Big Data Trial Sandbox for Hortonworks User Guide.

    Apache AmbariYou can log in to Ambari from the following URL: http://hdp-bde-demo:8080/#/login.Enter the following credentials to log in to Ambari:User name: adminPassword: admin

    Informatica AdministratorYou can access the Administrator tool from the following URL: http://hdp-bde-demo:6005Enter the following credentials to log in to the Administrator tool:User name: AdministratorPassword: Administrator

    6

  • Informatica DeveloperYou can start the Developer tool client from the Windows Start menu.Enter the following credentials to connect to the Model repository Infa_mrs:User name: AdministratorPassword: Administrator

    Big Data Trial Sandbox for Hortonworks SamplesThe Big Data Trial Sandbox for Hortonworks provides samples based on common Hadoop use cases.The Big Data Trial Sandbox for Hortonworks includes samples for the following use cases: Running common tutorial mappings on Hadoop. Performing data discovery on Hadoop. Performing data warehouse optimization. Processing complex files. Working with NoSQL databases.After you run the mappings in the Developer tool, you can monitor the mapping jobs in the Administrator tool.

    Running Common Tutorial Mappings on HadoopBig Data Trial Sandbox for Hortonworks provides a sample tutorial mapping that reads text files and counts how often words occur. The word count mappings appear in the Hadoop_tutorial project in the Developer tool. After you open a mapping, you can right-click the mapping to run the mapping on Hadoop.The Hadoop_tutorial project contains the following sample mappings:m_DataLoad_1

    m_DataLoad_1 loads data from the READ_WordFile1 flat file from your machine to the WRITE_HDFSWordFile1 flat file on HDFS.The following image shows the mapping m_DataLoad_1:

    m_DataLoad_2m_DataLoad_2 loads data from the READ_WordFile2 flat file from your machine to the WRITE_HDFSWordFile2 file on HDFS.

    7

  • The following image shows the mapping m_DataLoad_2:

    m_WordCountm_WordCount reads two source files from HDFS and parses the data and the output to a flat file on HDFS.The following image shows the mapping m_WordCount:

    The mapping contains the following objects: Sources. HDFS files. Expression transformations. Removes the carriage return and new line characters from a word. Union transformation. Forms a collective data set. Aggregator transformation. Counts the occurrence of each word in the mapping. Target. Flat file on HDFS.

    Performing Data Discovery on HadoopBig Data Trial Sandbox for Hortonworks provides samples that you can use to discover data on Hadoop and run and create profiles on the data. After you open the profile, you can right-click the profile to run the profile. Running a profile on any data source in the enterprise gives you a good understanding of the strengths and weaknesses of its data and metadata.The DataDiscovery project in the Developer tool includes the following samples that you can use to perform data discovery on Hadoop: CustomerData. Flat file data source that includes customer information. Profile_CustomerData. Profiles the customer data to determine the characteristics of the customer data.Use the samples to understand how to perform data discovery on Hadoop. You want to discover the quality of the source customer data in the CustomerData flat file before you use the customer data as a source in a mapping. You

    8

  • should verify the quality of the customer data to determine whether the data is ready for processing. You can run the Profile_CustomerData profile based on the source data to determine the characteristics of the customer data.The profile determines the characteristics of columns in a data source, such as value frequencies, unique values, null values, patterns, and statistics.The profile determines the following characteristics of source data: The number of unique and null values in each column, expressed as a number and percentage. The patterns of data in each column and the frequencies with which these values occur. Statistics about the column values, such as the maximum value length, minimum value length, first value, and

    last value in each column. The data types of the values in each column.The following figure shows the profile results that you can analyze to determine the characteristics of the customer data:

    Performing Data Warehouse OptimizationYou can optimize an enterprise data warehouse with the Hadoop system to store more terabytes of data cheaply in the warehouse. Big Data Trial Sandbox for Hortonworks provides samples that demonstrate how to perform data warehouse optimization on Hadoop.The DataWarehouseOptimization project in the Developer Tool includes samples that you can use to perform data warehouse optimization on Hadoop.Use the samples to analyze customer portfolios by processing the records that have changed in a 24 hour time period. You can offload the data on Hadoop, find the customer records that have been inserted, deleted, and updated in the last 24 hours, and then update those records in your data warehouse. You can capture these changes even if the number of columns change or if the keys change in the source files.To capture the changes, use the Data Warehouse Optimization workflow. The workflow contains mappings that move the data from local flat files to HDFS, identify the changes, and then load the final output to flat files.The following image shows the sample Data Warehouse Optimization workflow:

    9

  • To run the workflow, enter the following command to run the workflow from the command line:./infacmd.sh wfs startWorkflow -dn infa_domain -sn infa_dis -un Administrator -pd Administrator -Application App_DataWarehouseOptimization -wf wf_DataWarehouseOptimization

    To run the mappings in the workflow, open a mapping and right-click the mapping to run the mapping.The workflow contains the following mappings and transformations:Mapping_Day1

    The workflow object Mapping_Day1 reads customer data from flat files in a local file system and writes to an HDFS target for the first 24-hour period.

    Mapping_Day2The workflow object Mapping_Day 2 reads customer data from flat files in a local file system and writes to an HDFS target for the next 24-hour period.

    m_CDC_DWHOptimizationThe workflow object m_CDC_DWHOptimization captures the changed data. It reads data from HDFS and identifies the data that has changed. To increase performance, you can configure the mapping to run on Hadoop cluster nodes in a Hive environment.The following image shows the mapping m_CDC_DWHOptimization:

    The mapping contains the following objects: Sources. HDFS files that were the targets of the previous two mappings. The Data Integration Service

    reads all of the data as a single column. Expression transformations. Extract a key from the non-key values in the data. The expressions use the

    INSTR function and SUBSTR function to perform the extraction of key values. Joiner transformation. Performs a full outer join on the two sources based on the keys generated by the

    Expression transformations. Filter transformations. Use the output of the Joiner transformation to filter rows based on whether or not

    the rows should be updated, deleted, or inserted. Targets. HDFS files. The Data Integration Service writes the data to three HDFS files based on whether

    the data is inserted, deleted, or updated.

    10

  • Consolidated_MappingThe workflow object Consolidated_Mapping consolidates the data in the HDFS files and loads the data to the data warehouse.The following figure shows the mapping Consolidated_Mapping:

    The mapping contains the following objects: Sources. The HDFS files that were the target of the previous mapping are the sources of this mapping. Expression transformations. Add the deleted, updated, or inserted tags to the data rows. Union transformation. Combines the records. Target. Flat file that acts as a staging location on the local file system.

    Processing Complex FilesBig Data Trial Sandbox for Hortonworks provides samples to process large volumes of data from complex files that contain unstructured data. The data might be on the Hadoop Distributed File System (HDFS) or on your local file system.Big Data Trial Sandbox includes samples that demonstrate the following use cases to process complex files: Reading and parsing complex files. Writing to complex files.

    Reading and Parsing Complex FilesCapturing and analyzing unstructured or semi-structured data such as web traffic records is a challenge because of the volume of data involved. Big Data Trial Sandbox for Hortonworks provides samples to read and process semi-structured or unstructured data in complex files.The LogProcessing project in the Developer tool includes samples that you can use to read and parse complex files.Use the samples to process daily web logs from an online trading site and write the parsed data to a flat file. The web logs contain details about visitors who log in to the website and look up the value of stocks using stock symbols.To process the web logs, use the web log processing workflow.

    11

  • The following image shows the sample web log processing workflow:

    To run the workflow, enter the following command to run the workflow from the command line:./infacmd.sh wfs startWorkflow -dn infa_domain -sn infa_dis -un Administrator -pd Administrator -Application app_logProcessing -wf wf_LogProcessing

    To run the mappings in the workflow, open a mapping and right-click the mapping to run the mapping.You can run the following mappings and transformations in the workflow:m_LoadData

    The workflow object m_LoadData reads the parsed web log data and writes to a flat file target. The source and target are flat files.The following image shows the mapping m_LoadData:

    m_sample_weblog_parsingThe workflow object m_sample_weblog_parsing is a logical data object read mapping reads data from a HDFS source, parse the data using a Data Processor transformation, and writes to a logical data object.

    12

  • The following image shows the mapping m_sample_weblog_parsing:

    The following image shows the expanded logical data object read mapping m_sample_weblog_parsing:

    The mapping contains the following objects: Source. HDFS file that was the target of the previous mapping. Data Processor transformation. Processes the input binary stream of data, parses the data, and writes to

    XML format. Joiner transformation. Combines the activity of visitors who return to the website on the same day with

    stock queries. Expression transformation. Adds the current date to each transformed record. Target. Flat file.

    Writing to Complex FilesBig Data Trial Sandbox for Hortonworks provides samples to read, parse, and write large volumes of unstructured data to complex files.The Complex_File_Writer project in the Developer tool includes samples that you can use to write unstructured data to complex files.Use the samples to generate a report in XML format of the sales by country for each customer. You know the customer purchase order details such as customer ID, product names, and item quantity sold. The purchase order details are stored in semi-structured compressed XML files in HDFS. Create a mapping that reads all the customer purchase records from the files in HDFS and use a Data Processor transformation to process the sales by country for each customer. The mapping converts the semi-structured data to relational data and writes it to a relational target.

    13

  • The following figure shows the Complex File Writer sample mapping:

    The mapping contains the following objects:HDFS inputs

    The inputs, Read_customers_flatfile, Read_products_flatfile, Read_sales_flatfile, Read_promotions_flatfile, Read_countries_flatfile are flat files stored in HDFS.

    Transformations The Joiner transformation Joiner_products joins product and sales data. The Joiner transformation Joiner_promotions joins sales and promotion data. The Data Processor transformation, customer_sales_xml_generator, provides a binary, hierarchical output

    for sales by country for each customer.HDFS output

    The output, Write_binary_single_file, is a complex file stored in HDFS.

    Working with NoSQL DatabasesBig Data Trial Sandbox for Hortonworks provides samples that demonstrate how to read from and write to NoSQL databases. You can run the sample mappings to understand the simple extract, transform, and load scenarios when you use a NoSQL database.Big Data Trial Sandbox for Hortonworks provides samples for the following NoSQL database: HBase

    14

  • HBaseUse HBase when you need random real-time read and writes from a database. HBase is a non-relational distributed database that runs on top of the Hadoop Distributed File System (HDFS) and can store sparse data. Big Data Trial Sandbox for Hortonworks provides samples that demonstrate how to read and process binary data from HBase.The HBase_Binary_Data project in the Developer tool includes samples that you can use to read and process binary data in HBase tables to string data in a flat file target.The sample HBase table contains the details of people and the cars that they purchased over a period of time. The table contains the Details and Cars column families. The column names of the Cars column family are of String data type. You can get all columns in the Cars column family as an single binary column. You can use the sample Java transformation to covert the binary data to string data. You can join the data from both the column families and write it to a flat file.To process the Hbase binary data, use the wf_HBase_Binary_Data workflow.The following figure shows the wf_HBase_Binary_Data workflow:

    To run the workflow, enter the wfs startworkflow command to run the workflow from the command line.To run the mappings in the workflow, open a mapping and right-click the mapping to run the mapping.The workflow contains following mappings and transformations:m_person_Cars_Write_Static

    The workflow object references the m_person_Cars_Write_Static HBase write data object mapping that writes data to the columns in the Cars and Details column family.

    m_preson_Cars_Write_Static1The workflow object references the m_pers_cars_static_reader mapping that transforms the binary data in an HBase data object to columns of the String data type and writes the details to a flat file data object.:

    The HBase mapping contains the following objects:

    15

  • Person_Car_Static_ReadThe first source for the mapping is an HBase data object named Person_Car_Static that contains the columns in the Details column family. The HBase read data object operation is named Person_Car_Static_Read.

    pers_cars_Static_bin_readThe second source for the mapping is an HBase data object named Person_cars_Static_bin that contains the data in the Cars column family. The HBase read data object operation is named pers_cars_Static_bin_read.

    Transformations The HBase_ProtoBuf_Read_String.xml Java transformation transforms the single column of binary

    data in the Person_Car_Static data object to column values of the String data type. The Sorter transformation sorts the data in ascending order based on the row ID. The Expression and Aggregator transformations convert the row data to columnar data. The Joiner transformation combines the data from both the HBase input sources before you load the

    data to the flat file data object. The Filter transformation filters out any person with age less than or equal to 43.

    Write_Person_Cars_FFThe target for the mapping is a flat file data object named Person_Cars_FF. The flat file data object write operation is named Write_Person_Cars_FF to write data from the Cars and Details column families.

    The Data Integration Service converts the binary column in Person_cars_Static_bin, joins the data in Person_Car_Static, and writes the data to the flat file data object Write_Person_Cars_FF.

    TroubleshootingThis section describes troubleshooting information.Informatica Services shut down

    The Informatica services might shut down when the machine on which you run the virtual machine goes into hibernation or when you resume the virtual machine.Run the following command to restart the services on the operating system of the virtual machine: sh /home/infauser/BDETRIAL/.cmdInfaServiceUtil.sh start

    Debug mapping failuresTo debug mapping failures, check the error messages in the mapping log file.The mapping log file appears in the following location: /home/infauser/bdetrial_repo/informatica/informatica/tomcat/bin/disTemp

    Virtual machine does not start because of a 64-bit errorVMWare Player displays a message that states it cannot power on a 64-bit virtual machine. Or, VMware Player might display the following error when you play the virtual machine: The host supports Intel VT-x, but Intel VT-x is disabled. Intel VT-x might be disabled if it has been disabled in the BIOS/firmware settings or the host has not been power-cycled since changing this setting.

    You must enable the BIOS of the machine on which VMware Player runs to use Intel Virtualization Technology. For more information, refer to the VMware Knowledge Base article here.

    16

  • Virtual machine is in a suspended stateIf the virtual machine is in a suspended state, you need to resume the virtual machine. You need to log in to the virtual machine. After you log in, the Informatica services and Hadoop services start automatically.In VMware Player, select the virtual machine and click Play virtual machine.Enter a user name and password for the virtual machine. The default user name and password is: infa / infa

    The Developer tool takes a long time to connect to the Model repositoryThe Developer tool might take a long time to connect to the Model repository because the virtual machine cannot find the IP address and host name of the client machine.You must add the IP address and host name of the client machine on the hosts file of the virtual machine.Use the ipconfig and hostname commands from the command line of the Windows machine to find the IP address and hostname of the Windows machine.Add the IP address and the host name to the hosts file on the virtual machine.For example, the hosts file can be located in the following directory on the virtual machine: /etc/hostsAdd the following line to the hosts file:

    Mapping fails and job execution failed errors appear in the mapping log

    If the mapping fails and you cannot determine the cause of the job execution failed errors that appear in the mapping log, you can clear the contents of the following directory on the machine that hosts the virtual machine: /tmp/infa. Then, run the mapping again.

    Author Big Data Edition Team

    17

    AbstractSupported VersionsTable of ContentsInstallation and Configuration OverviewStep 1. Download the SoftwareDownload and Install VMWare PlayerRegister at Informatica MarketplaceDownload the Big Data Trial Sandbox for Hortonworks Files

    Step 2. Start the Big Data Trial Sandbox for Hortonworks Virtual MachineStep 3. Configure and Install the Big Data Trial Sandbox for Hortonworks ClientConfigure the Domain Properties on the Windows MachineConfigure a Static IP Address on the Windows MachineInstall the Big Data Trial Sandbox for Hortonworks Client

    Step 4. Access the Big Data Trial Sandbox for Hortonworks SandboxApache AmbariInformatica AdministratorInformatica Developer

    Big Data Trial Sandbox for Hortonworks SamplesRunning Common Tutorial Mappings on HadoopPerforming Data Discovery on HadoopPerforming Data Warehouse OptimizationProcessing Complex FilesReading and Parsing Complex FilesWriting to Complex Files

    Working with NoSQL DatabasesHBase

    TroubleshootingAuthor