User Guide · 2019-07-18 · Cloud Data Migration (CDM) enables data migration among various data...
Transcript of User Guide · 2019-07-18 · Cloud Data Migration (CDM) enables data migration among various data...
Cloud Data Migration
User Guide
Issue 10
Date 2018-08-03
HUAWEI TECHNOLOGIES CO., LTD.
Copyright © Huawei Technologies Co., Ltd. 2018. All rights reserved.No part of this document may be reproduced or transmitted in any form or by any means without prior writtenconsent of Huawei Technologies Co., Ltd. Trademarks and Permissions
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.All other trademarks and trade names mentioned in this document are the property of their respectiveholders. NoticeThe purchased products, services and features are stipulated by the contract made between Huawei and thecustomer. All or part of the products, services and features described in this document may not be within thepurchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information,and recommendations in this document are provided "AS IS" without warranties, guarantees orrepresentations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in thepreparation of this document to ensure accuracy of the contents, but all statements, information, andrecommendations in this document do not constitute a warranty of any kind, express or implied.
Huawei Technologies Co., Ltd.Address: Huawei Industrial Base
Bantian, LonggangShenzhen 518129People's Republic of China
Website: http://e.huawei.com
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. i
Contents
1 Introduction.................................................................................................................................... 11.1 CDM............................................................................................................................................................................... 11.2 Data Sources Supported by CDM.................................................................................................................................. 11.3 Application Scenarios.....................................................................................................................................................51.4 Related Services............................................................................................................................................................. 61.5 Basic Concepts............................................................................................................................................................... 91.6 Accessing and Using CDM.......................................................................................................................................... 101.6.1 How to Access CDM.................................................................................................................................................101.6.2 How to Use CDM...................................................................................................................................................... 101.6.3 CDM Billing.............................................................................................................................................................. 101.6.4 User Permissions....................................................................................................................................................... 101.7 Constraints.................................................................................................................................................................... 11
2 Getting Started............................................................................................................................. 162.1 Overview...................................................................................................................................................................... 162.2 Purchasing CDM.......................................................................................................................................................... 162.3 Creating Links.............................................................................................................................................................. 182.4 Creating and Executing a Job....................................................................................................................................... 212.5 Querying Job Execution Results...................................................................................................................................24
3 Cluster Management...................................................................................................................253.1 Creating a Cluster......................................................................................................................................................... 253.2 Binding or Unbinding an EIP....................................................................................................................................... 283.3 Restarting a Cluster...................................................................................................................................................... 293.4 Stopping, Starting, or Deleting a Cluster......................................................................................................................303.5 Viewing Cluster Configurations, Logs, and Monitoring Data......................................................................................313.6 Monitoring.................................................................................................................................................................... 323.6.1 CDM Metrics.............................................................................................................................................................333.6.2 Configuring Alarm Rules.......................................................................................................................................... 333.6.3 Querying Metrics.......................................................................................................................................................343.7 CTS...............................................................................................................................................................................363.7.1 Key CDM Operations Recorded by CTS.................................................................................................................. 363.7.2 Viewing Traces.......................................................................................................................................................... 37
4 Link Management....................................................................................................................... 39
Cloud Data MigrationUser Guide Contents
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. ii
4.1 Creating a Link............................................................................................................................................................. 394.2 Link Parameter Description..........................................................................................................................................424.2.1 Link to Relational Databases.....................................................................................................................................424.2.2 Link to OBS...............................................................................................................................................................464.2.3 Link to OSS on Alibaba Cloud..................................................................................................................................464.2.4 Link to Qiniu Cloud Object Storage..........................................................................................................................474.2.5 Link to HDFS............................................................................................................................................................ 474.2.6 Link to HBase............................................................................................................................................................514.2.7 Link to Hive...............................................................................................................................................................544.2.8 Link to CloudTable.................................................................................................................................................... 544.2.9 Link to an FTP or SFTP Server................................................................................................................................. 554.2.10 Link to a NAS Server.............................................................................................................................................. 554.2.11 Link to MongoDB/DDS...........................................................................................................................................564.2.12 Link to Redis/DCS...................................................................................................................................................564.2.13 Link to Kafka...........................................................................................................................................................574.2.14 Link to DIS.............................................................................................................................................................. 574.2.15 Link to Elasticsearch............................................................................................................................................... 584.2.16 Link to DLI..............................................................................................................................................................584.3 Editing/Deleting a Link................................................................................................................................................ 58
5 Job Management..........................................................................................................................605.1 Creating a Job............................................................................................................................................................... 605.1.1 Table/File Migration.................................................................................................................................................. 605.1.2 Entire DB Migration..................................................................................................................................................705.2 Source Job Parameters..................................................................................................................................................735.2.1 From OBS/OSS......................................................................................................................................................... 745.2.2 From HDFS............................................................................................................................................................... 785.2.3 From HBase/CloudTable........................................................................................................................................... 805.2.4 From Hive..................................................................................................................................................................825.2.5 From FTP/SFTP/NAS............................................................................................................................................... 825.2.6 From HTTP/HTTPS.................................................................................................................................................. 865.2.7 From a Relational Database.......................................................................................................................................875.2.8 From MongoDB/DDS............................................................................................................................................... 905.2.9 From Redis................................................................................................................................................................ 915.2.10 From DIS................................................................................................................................................................. 915.2.11 From Apache Kafka.................................................................................................................................................925.2.12 From Elasticsearch/Cloud Search Service...............................................................................................................935.3 Destination Job Parameters.......................................................................................................................................... 935.3.1 To OBS...................................................................................................................................................................... 935.3.2 To HDFS....................................................................................................................................................................975.3.3 To HBase/CloudTable................................................................................................................................................985.3.4 To Hive.................................................................................................................................................................... 1005.3.5 To FTP/SFTP/NAS..................................................................................................................................................102
Cloud Data MigrationUser Guide Contents
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. iii
5.3.6 To a Relational Database......................................................................................................................................... 1055.3.7 To DDS.................................................................................................................................................................... 1085.3.8 To DCS.................................................................................................................................................................... 1095.3.9 To Elasticsearch/Cloud Search Service................................................................................................................... 1095.3.10 To DLI....................................................................................................................................................................1105.4 Scheduling Job Execution...........................................................................................................................................1105.5 Managing a Single Job................................................................................................................................................1145.6 Batch Managing Jobs..................................................................................................................................................115
6 Typical Scenarios.......................................................................................................................1176.1 Migrating Data from DDS to DWS............................................................................................................................ 1176.2 Periodically Backing Up FTP/SFTP Files to HUAWEI CLOUD OBS..................................................................... 1226.3 Migrating Data from OSS to OBS..............................................................................................................................1296.4 Migrating Data from On-premises Redis to DCS...................................................................................................... 1346.5 Migrating Data from Oracle to Cloud Search Service................................................................................................1396.6 Migrating Data from OBS to Cloud Search Service.................................................................................................. 1436.7 Migrating Data from OBS to DLI.............................................................................................................................. 1486.8 Migrating Data from the MySQL Database to the MRS Hive Partition Table.......................................................... 1536.9 Migrating Data from the MySQL Database to DDM.................................................................................................1616.10 Migrating the Entire MySQL Database to RDS.......................................................................................................1716.11 Migrating the Entire Elasticsearch Database to Cloud Search Service.................................................................... 176
7 Advanced Operations............................................................................................................... 1817.1 Incremental File Migration.........................................................................................................................................1817.2 Incremental Migration of Relational Databases......................................................................................................... 1837.3 HBase/CloudTable Incremental Migration.................................................................................................................1877.4 Incremental Synchronization Using the Macro Variables of Date and Time............................................................. 1887.5 Migration in Transaction Mode.................................................................................................................................. 1927.6 Data Encryption During the Migration to OBS..........................................................................................................1937.7 MD5 Verification for Files in Migration.................................................................................................................... 1947.8 Field Conversion During Migration........................................................................................................................... 1967.9 Migration of a List of Files.........................................................................................................................................2037.10 Using Regular Expressions to Separate Semi-structured Text................................................................................. 2047.11 GDS Import Mode.................................................................................................................................................... 2097.12 File Formats.............................................................................................................................................................. 211
8 FAQs.............................................................................................................................................2208.1 What Are the Advantages of CDM?...........................................................................................................................2208.2 What Service Data Can Be Migrated by CDM?.........................................................................................................2218.3 What Security Protection Measures Are Used in CDM?........................................................................................... 2238.4 What is the Performance of Using CDM to Migrate Data?........................................................................................2238.5 What Is the Most Economical Way to Migrate Data from the Public Network Using CDM?...................................2238.6 Does CDM Support Incremental Data Migration?.....................................................................................................2248.7 Can Fields Be Converted During Data Migration?.................................................................................................... 228
Cloud Data MigrationUser Guide Contents
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. iv
8.8 What Data Formats Are Supported When the Data Source Is Hive?......................................................................... 2368.9 Does CDM Support Job Synchronization Between Different Clusters?.................................................................... 2368.10 Can I Create Jobs in Batches on CDM?................................................................................................................... 2378.11 Can I Back Up Jobs When the CDM Cluster Is Not Used for a Long Time?.......................................................... 2378.12 How Do I Use Java to Invoke CDM RESTful APIs to Create Data Migration Jobs?............................................. 2378.13 How Do I Connect On-premises Intranet or Third-Party Private Network to CDM?..............................................2438.14 What Do I Do If the System Displays a Message Indicating that the Date Format Fails to Be Parsed When Data IsImported to Cloud Search Service?.................................................................................................................................. 2458.15 What Do I Do If the Map Field Tab Page Cannot Display All Columns When Data Is Exported from HBase/CloudTable?......................................................................................................................................................................2488.16 How Do I Select Distribution Columns When Using CDM to Migrate Data to DWS?.......................................... 2518.17 What Do I Do If the Error Message "value too long for type character varying" Is Displayed When I Migrate Datato DWS?........................................................................................................................................................................... 252
A Version Updates....................................................................................................................... 255
B Change History..........................................................................................................................261
Cloud Data MigrationUser Guide Contents
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. v
1 Introduction
1.1 CDMCloud Data Migration (CDM) enables data migration among various data sources. It allowsyou to migrate data among public cloud services or between the public cloud and on-premisesservice systems.
Based on the distributed computing framework and the concurrent processing technology,CDM helps you migrate massive sets of data stably and efficiently. You can migrate dataonline and construct a desired data structure.
CDM provides the following features:
l Ease of use: You can migrate data by configuring data sources and migration jobs on thegraphical user interface (GUI), and CDM will manage and maintain the data sources andmigration tasks. In other words, you only need to focus on the data migration logicwithout worrying about the environment, which greatly reduces development andmaintenance costs.
l High efficiency: Based on the distributed computing framework, CDM jobs are split intoindependent sub-jobs and executed concurrently, which drastically improves datamigration efficiency. In addition, efficient data import application programminginterfaces (APIs) are used to import data from Hive, HBase, Data Warehouse Service(DWS), and MySQL database.
l Support for various data sources: Various data sources such as databases, Hadoop,NoSQL, data warehouses, and files are supported.
l Support for multiple network environments: CDM helps you easily cope with variousdata migration scenarios, including data migration to the cloud, data exchange on thecloud, and data migration to on-premises service systems, regardless of whether the datais stored on on-premises Internet Data Center (IDC), third-party clouds (public cloud orprivate cloud), HUAWEI CLOUD services, or self-built databases or file systems usingElastic Cloud Servers (ECSs) on HUAWEI CLOUD.
1.2 Data Sources Supported by CDMCDM supports table/file migration and entire DB migration:
Cloud Data MigrationUser Guide 1 Introduction
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 1
l Table/file migration: It is applicable to data migration to the cloud, data exchange on thecloud, and data migration to on-premises service systems.
l Entire DB migration: It is applicable to database migration to the cloud.
Table/File Migration
Table 1-1 describes the supported data sources.
Table 1-1 Supported data sources during table/file migration
Data SourceType
Data Source Used as aSource
Used as aDestination
Data warehouse Data Warehouse Service (DWS) Supported Supported
Data Lake Insight (DLI) Not supported Supported
FusionInsight LibrA Supported Supported
Hadoop MRS HDFS Supported Supported
MRS HBase Supported Supported
MRS Hive Supported Supported
FusionInsight HDFS Supported Supported
Apache HDFS Supported Supported
Hadoop HBase Supported Supported
FusionInsight HBase Supported Supported
Object storage Object Storage Service (OBS) Supported Supported
Alibaba Cloud Object StorageService (OSS)
Supported Not supported
Qiniu Cloud Object Storage Supported Not supported
File system FTP Supported Supported
SFTP Supported Supported
HTTP Supported Not supported
Network Attached Storage(NAS)
Supported Supported
Relational database RDS for MySQL Supported Supported
RDS for PostgreSQL Supported Supported
RDS for SQL Server Supported Supported
Distributed DatabaseMiddleware (DDM)
Supported Supported
MySQL Supported Supported
Cloud Data MigrationUser Guide 1 Introduction
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 2
Data SourceType
Data Source Used as aSource
Used as aDestination
PostgreSQL Supported Not supported
Microsoft SQL Server Supported Not supported
Oracle Supported Not supported
IBM Db2 Supported Not supported
Derecho (GaussDB) Supported Not supported
NoSQL Distributed Cache Service(DCS)
Not supported Supported
Document Database Service(DDS)
Supported Supported
CloudTable Service(CloudTable)
Supported Supported
Redis Supported Not supported
MongoDB Supported Not supported
Search Cloud Search Service Supported Supported
Elasticsearch Supported Supported
Message system Data Ingestion Service (DIS) Supported(migrated toCloud SearchService only)
Not supported
Apache Kafka
NOTE
In the preceding table, the non-HUAWEI CLOUD data sources, such as MySQL, can be the MySQLbuilt in the local data center, created by users on Elastic Cloud Server (ECS), or on the third-party cloud.
Entire DB MigrationEntire database migration is applicable to the scenario where an on-premises data center or adatabase created on the HUAWEI CLOUD ECS is synchronized to HUAWEI CLOUDdatabase services or big data services. It is suitable for offline database migration but notonline real-time migration. Figure 1-1 lists the data sources that support entire databasemigration using CDM.
Cloud Data MigrationUser Guide 1 Introduction
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 3
Figure 1-1 Supported data sources in entire DB migration
Field Mapping in Automatic Table CreationCDM automatically creates tables at the destination during database migration. Figure 1-2describes the field mapping between DWS tables created by CDM and source tables. Forexample, if you use CDM to migrate the Oracle database to DWS, CDM automatically createstables on DWS and maps the NUMBER(3,0) field of the Oracle database to the SMALLINTfield of DWS.
Figure 1-2 Field mapping in automatic table creation on DWS
Cloud Data MigrationUser Guide 1 Introduction
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 4
1.3 Application Scenarios
Migrating Local Data to the Public Cloud
Local data is stored in the IDC that you have built or rent, or on the private cloud, includingdata stored in relational databases, NoSQL databases, OLAP databases, and file systems.
In this scenario, if you want to use the computing and storage resources of the public cloud,you must migrate local data to the public cloud in advance, and ensure that the local networkcan communicate with the public cloud network.
Figure 1-3 Migrating local data to the public cloud
Migrating Data Between Public Cloud Services
In this scenario, you are allowed to exchange data between the following public cloudservices:l OBSl Relational Database Service (RDS)l MapReduce Service (MRS)l DWSl DDSl DCSl Cloud Search Servicel DISl CloudTablel DLIl DDMl Databases or file systems deployed on the ECSs
Migrating Public Cloud Data to On-Premises Environments
A local environment is a data storage system in the IDC that you have built or rent, or on theprivate cloud, including relational databases and file systems.
In this scenario, after data is processed using the computing and storage resources of thepublic cloud, the processed data can be returned to on-premises service systems, specifically
Cloud Data MigrationUser Guide 1 Introduction
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 5
relational databases and file systems. Additionally, ensure that the local network cancommunicate with the public cloud network.
Figure 1-4 Migrating public cloud data to on-premises environments
1.4 Related Services
IAM
CDM uses Identity and Access Management (IAM) for authentication and authorization.
VPC
CDM clusters are created in the subnets of a Virtual Private Cloud (VPC). VPCs provide asecure, isolated, and logical network environment for CDM clusters.
MRS
CDM supports data import and export using MRS.
OBS
CDM supports data import and export using OBS, which also stores backup files and logs ofCDM clusters.
Cloud Eye
CDM uses Cloud Eye to monitor cluster performance metrics, delivering status information ina concise and efficient manner, as shown in Table 1-2. For more information about CloudEye, see the Cloud Eye User Guide.
Table 1-2 CDM performance metrics
Metric Description ValueRange
MonitoredObject
Bytes In Measures the network inbound rate ofthe monitored object.Unit: byte/s
≥ 0 bytes/s Cloud DataMigration
Cloud Data MigrationUser Guide 1 Introduction
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 6
Metric Description ValueRange
MonitoredObject
Bytes Out Measures the network outbound rateof the monitored object.Unit: byte/s
≥ 0 bytes/s Cloud DataMigration
CPU Usage Measures the CPU usage of themonitored object.Unit: %
0% to 100% Cloud DataMigration
Memory Usage Measures the memory usage of themonitored object.Unit: %
0% to 100% Cloud DataMigration
CTS
CDM uses Cloud Trace Service (CTS) to record operations for later query, audit, andbacktrack operations. Table 1-3 displays the recorded CDM operations. For more informationabout CTS, see the Cloud Trace Service User Guide.
Table 1-3 CDM operations recorded by CTS
Operation Resource Type Trace Name
Creating a cluster cluster createCluster
Deleting a cluster cluster deleteCluster
Modifying clusterconfiguration
cluster modifyCluster
Starting a cluster cluster startCluster
Stopping a cluster cluster stopCluster
Restarting a cluster cluster restartCluster
Importing a job cluster clusterImportJob
Binding an EIP cluster bindEip
Unbinding an EIP cluster unbindEip
Creating a link link createLink
Modifying a link link modifyLink
Deleting a link link deleteLink
Creating a job job createJob
Modifying a job job modifyJob
Deleting a job job deleteJob
Cloud Data MigrationUser Guide 1 Introduction
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 7
Operation Resource Type Trace Name
Starting a job job startJob
Stopping a job job stopJob
DWS
CDM allows you to import data to and export data from DWS.
RDS
CDM allows you to import data to and export data from RDS, including RDS for MySQL,RDS for PostgreSQL, and RDS for SQL Server.
DDS
CDM allows you to export data from DDS, but it does not allow you to import data to DDS.
DCS
CDM allows you to import data to DCS, but it does not allow you to export data from DDS.
Cloud Search Service
CDM allows you to import data to and export data from Cloud Search Service.
DIS
CDM allows you to export data from DIS to Cloud Search Service, but it does not allow youto import data to DIS.
CloudTable
CDM allows you to import data to and export data from CloudTable.
DLI
CDM allows you to import data to DLI, but it does not allow you to export data from DLI.
DDM
CDM allows you to import data to and export data from DDM.
Data Lake Factory (DLF)
CDM can be orchestrated and scheduled as a node task of DLF.
Cloud Data MigrationUser Guide 1 Introduction
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 8
1.5 Basic Concepts
CDM ClusterA CDM cluster is a CDM instance that you have purchased. It consists of one or more VMs.You can purchase multiple CDM clusters for different purposes. For example, you canpurchase a CDM cluster for the financial department and the procurement departmentrespectively to isolate data access permissions.
Local EnvironmentA local environment is a data storage system in the IDC that you have built or rent, or on theprivate cloud, including relational databases and file systems.
Local DataLocal data is stored in the IDC that you have built or rent, or on the private cloud, includingdata stored in relational databases, NoSQL databases, OLAP databases, and file systems.
ConnectorA connector is a built-in object template used for connecting to a data source. Currently, CDMuses connectors to connect to OBS, MRS, and databases. New connectors can be added toCDM as well.
LinkA link is an object set up based on a connector and used to connect to a specific data source.
To create a link, you must specify the link name, connector, data source address, andauthentication information. For example, to connect to a MySQL database, you must set thehost IP address, port number, username, and password.
After a link is set up, it can be used by multiple jobs as either a source or a destination link.
JobA job is a data migration task that you have created to migrate data from a specific datasource to another. To create a job, you must specify a source link, destination link, and datamapping rules.
Source Job ConfigurationDuring job creation, the source link specifies the data source from which data is extracted.The job parameters of different source links vary. For example, the table or directory fromwhich data is exported is specified in the job configuration at the source end.
Destination Job ConfigurationDuring job creation, the destination link specifies the data source to which data is loaded. Thejob parameters of different destination links vary. For example, the table or directory to whichdata is imported is specified in the job configuration of the destination end.
Cloud Data MigrationUser Guide 1 Introduction
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 9
Field MappingDuring job creation, especially jobs of migrating data between heterogeneous data sources,you must configure the mapping between the source and destination data sources, such asfield mapping and field type mapping.
1.6 Accessing and Using CDM
1.6.1 How to Access CDMCDM provides a web-based service management platform, that is, the management console.You can access CDM using HTTPS-compliant application programming interfaces (APIs) orthe management console.l Management console
After registering with the public cloud, log in to the management console to accessCDM.
l APIIf you want to integrate CDM with third-party systems for secondary development,access CDM using APIs. For details, see the Cloud Data Migration API Reference.
1.6.2 How to Use CDMThe procedure of applying for and using CDM is as follows:
1. Apply for CDM.To apply for CDM is to build a CDM cluster. For details about how to create a CDMcluster, see Creating a Link.
2. Create links.A source link and a destination link are required for a data migration task. Select aproper connector according to the data source type. For details, see Creating a Link.
3. Create and execute jobs.Select the source and destination links and configure job and task parameters accordingto the types of source and destination data sources. For details, see Creating a Job.
4. Query job execution results.After a job is executed, you can query its execution logs, data statistics, and historicalexecution status. For details about how to query the historical job information, seeManaging a Single Job.
1.6.3 CDM BillingCDM adopts the pay-per-use billing mode on an hourly basis, which means that you arecharged on the hour. This mode is flexible so that you can start or stop the CDM cluster asyou like. For details about the billing items, see the Cloud Data Migration Price Description.
1.6.4 User PermissionsCDM uses IAM to isolate links and jobs created by multiple accounts in a CDM cluster.
Currently, CDM does not support user group permission. In other words, users cannot beassigned to the same user group to share information about their links and jobs.
Cloud Data MigrationUser Guide 1 Introduction
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 10
1.7 ConstraintsDue to various factors such as technology and cost, CDM has constraints on data migration.
CDM System Constraints1. Currently, CN North-Beijing1, CN East-Shanghai2, and CN South-Guangzhou are
supported.
2. You cannot modify the flavor of an existing cluster. If you require a higher flavor, createa cluster.
3. CDM does not support the function of controlling the data migration speed. Therefore,do not perform data migration during peak hours.
4. Currently, the network bandwidth of all CDM instances is 1 Gbit/s. Theoretically, themaximum volume of data transmission per instance per day is 10 TB. If you havespecific requirement on the transmission speed, use multiple CDM instances.
The preceding data volume is the theoretical value. The actual data volume is restrictedby the data source type, read and write performance of the source and destination datasources, and bandwidth. The actual data volume can reach about 8 TB per day (large filemigration to OBS). It is recommended that you test the speed with a small amount ofdata before migration.
5. CDM supports incremental file migration (by skipping repeated files), but does notsupport resumable transfer.
For example, if three files are to be migrated and the second file fails to be migrated dueto the network fault. When the migration task is started again, the first file is skipped.The second file, however, cannot be migrated from the point where the fault occurs, butcan only be migrated again.
6. During file migration, a single task supports a maximum of 100,000 files. If there are toomany files in the directory to be migrated, you are advised to split the files into differentdirectories and create multiple tasks.
7. The number of tasks executed by a single CDM instance at a time is 30 (cdm.large), 20(cdm.medium), or 10 (cdm.small). The number of queued jobs (in the pending state) tobe executed is 10,000, 5,000, and 2,000 respectively.
In database migration, a job is equivalent to migrating a table. In file migration, multiplefiles can be migrated in a job.
8. During the export of custom links and jobs, CDM does not export the access password ofthe corresponding data source. Before importing the job configuration to CDM, you needto manually edit the JSON file to supplement the password.
9. The cluster cannot automatically upgrades to a new version. You need to use the importand export functions to upgrade the cluster to the new version.
10. CDM does not automatically back up user job configurations. You need to export andback up configuration data using the export function.
11. If VPC peering connection is configured, the peer VPC subnet may overlap with theCDM management network. As a result, data sources in the peer VPC cannot beaccessed. You are advised to use the public network for cross-VPC data migration, orcontact the customer service personnel to add specific routes to the VPC peeringconnection in the CDM background.
Cloud Data MigrationUser Guide 1 Introduction
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 11
General Constraints on Database Migration1. CDM is mainly used for batch migration. It supports only limited incremental migration
but does not support real-time incremental migration. You are advised to use DataReplication Service (DRS) to migrate the incremental data of the database to RDS.
2. The entire DB migration of CDM supports only data table migration but does notsupport migration of database objects such as stored procedures, triggers, functions, andviews. Views are migrated as tables.CDM applies only to scenarios where databases are migrated to HUAWEI CLOUD at atime, including homogeneous and heterogeneous database migrations. CDM is notapplicable to data synchronizations, such as disaster recovery and real-timesynchronization.
3. When CDM fails to migrate the entire database or data table, the data that has beenimported to the target table will not be rolled back automatically. If you want to performmigration in transaction mode, configure the Import to Staging Table parameter to rollback data when migration fails.In extreme cases, the created stage table or temporary table cannot be automaticallydeleted. You need to manually clear the table (the table name of the stage table ends with_cdm_stage). For example, cdmtet_cdm_stage).
4. If CDM needs to access data sources in the local data center (for example, the on-premises MySQL database), the data sources must support Internet access and the CDMinstances must be bound with elastic IP addresses. In this case, the best security practiceis to configure the firewall or security policies to allow only the EIPs of the CDMinstances to access the local data sources.
5. Only common data types are supported, including character strings, digits, and dates.Object types are limited. If objects are too large, migration cannot be performed.
6. Only the GBK and UTF-8 character sets are supported.
Constraints on MRS Data SourcesEach CDM cluster supports data import and export of only one MRS data source. To importand export data of different MRS data sources, create multiple CDM clusters.
Constraints on FusionInsight HD and Apache Hadoop Data SourcesIf the FusionInsight HD and Apache Hadoop data sources are deployed in the local datacenter, CDM must access all nodes in the cluster for reading and writing the Hadoop files.Therefore, the network access must be enabled for each node.
You are advised to use Direct Connect to improve the migration speed while ensuringnetwork access.
Constraints on DWS and FusionInsight LibrA Data Sources1. If the DWS primary key or table contains only one field, the field type must be a
common character string, value, or date. When data is migrated from another database toDWS, if automatic table creation is selected, the primary key must be of the followingtypes. If no primary key is set, at least one of the following fields must be set. Otherwise,the table cannot be created and the CDM job fails.– INTEGER TYPES: TINYINT, SMALLINT, INT, BIGINT, NUMERIC/DECIMAL– CHARACTER TYPES: CHAR, BPCHAR, VARCHAR, VARCHAR2,
NVARCHAR2, TEXT
Cloud Data MigrationUser Guide 1 Introduction
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 12
– DATA/TIME TYPES: DATE, TIME, TIMETZ, TIMESTAMP, TIMESTAMPTZ,INTERVAL, SMALLDATETIME
2. In DWS, the character string '' is null. A null character string cannot be inserted into afield with non-null constraints. This is inconsistent with the MySQL behavior. MySQLdoes not consider that '' is null. Migration from MySQL to DWS may fail due to thepreceding reason.
3. When the Gauss Data Service (GDS) mode is used to quickly import data to DWS, youneed to configure a security group or firewall policy to allow DataNodes of DWS orFusionInsight LibrA to access port 25000 of the CDM IP address.
4. When data is imported to DWS in GDS mode, CDM automatically creates a foreigntable for data import. The table name ends with the universally unique identifier (UUID)(for example, cdmtest_aecf3f8n0z73dsl72d0d1dk4lcir8cd). If a job fails, it will beautomatically deleted. In extreme cases, you may need to manually delete it.
Constraints on OBS Data Sources1. During file migration, the system automatically transfers the files concurrently. In this
case, Concurrent Extractors in the task configuration is invalid.
2. Resumable transfer is not supported. If CDM fails to transfer files, OBS fragments aregenerated. You need to clear fragments on the OBS console to prevent space occupation.
3. CDM does not support the versioning control function of OBS.
4. During incremental migration, the number of files or objects in the source directory of asingle job depends on the CDM cluster flavor. A cdm.large cluster supports a maximumof 300,000 files; a cdm.medium cluster supports a maximum of 200,000 files; and acdm.small cluster supports a maximum of 100,000 files.
If the number of files or objects in a single directory exceeds the upper limit, split thefiles or objects into multiple migration jobs based on subdirectories.
5. The key for encryption for data migrated to OBS is created in Key ManagementService (KMS). This function is available only in CN North-Beijing1.
Constraints on Oracle Data Sources
Real-time incremental data synchronization is not supported for Oracle databases.
Constraints on DCS and Redis Data Sources1. Because DCS restricts the commands for obtaining keys, it cannot serve as the migration
source but can be the migration destination. The Redis service of the third-party cloudcannot serve as the migration source. However, the Redis set up in the on-premises datacenter or on the ECS can be the migration source and destination.
2. Only the hash and string data formats are supported.
Constraints on DDS and MongoDB Data Sources
When you migrate data from MongoDB to a relational database, CDM reads the first row ofthe collection as an example of the field list. If the first row of data does not contain all fieldsof the collection, you need to manually add fields.
Cloud Data MigrationUser Guide 1 Introduction
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 13
Constraints on Cloud Search Service and Elasticsearch Data Sources1. CDM supports automatic creation of indexes and field types. The index and field type
names can contain only lowercase letters.2. You cannot modify the field type under an index after it is created, but only create
another field.If you need to modify the field type, you need to create an index or run the Elasticsearchcommand on Kibana to delete the existing index and create another index (the data isalso deleted).
3. When the field type of the index created by CDM is date, the data format must be yyyy-MM-dd HH:mm:ss.SSS Z. For example, 2018-08-08 08:08:08.888 +08:00.During data migration to Cloud Search Service, if the original data of the date field doesnot meet the format requirements, you can use the expression conversion function ofCDM to convert the data to the preceding format.
Constraints on DIS and Kafka Data Sources1. The data in the message body is a record in CSV format that supports multiple
delimiters. Messages cannot be parsed in binary or other formats.2. If a job is set to run for a long time, the job will fail if the DIS system is interrupted.
Constraints on CloudTable and HBase Data Sources1. When you migrate data from CloudTable or HBase, CDM reads the first row of the table
as an example of the field list. If the first row of data does not contain all fields of thetable, you need to manually add fields.
2. Because HBase is schema-less, CDM cannot obtain the data types. If the data is stored inbinary format, CDM cannot parse the data.
Constraints on Hive Data SourcesWhen Hive serves as the migration destination, if the storage format is TEXTFILE, delimitersmust be explicitly specified in the statement for creating Hive tables. The following gives anexample.
CREATE TABLE csv_tbl(smallint_value smallint,tinyint_value tinyint,int_value int,bigint_value bigint,float_value float,double_value double,decimal_value decimal(9, 7),timestmamp_value timestamp,date_value date,varchar_value varchar(100),string_value string,char_value char(20),boolean_value boolean,binary_value binary,varchar_null varchar(100),string_null string,char_null char(20),int_null int)ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'WITH SERDEPROPERTIES ("separatorChar" = "\t",
Cloud Data MigrationUser Guide 1 Introduction
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 14
"quoteChar" = "'","escapeChar" = "\\")STORED AS TEXTFILE;
Constraints on Incremental Data Migration in MySQL Binlog Model Currently, this mode can be used to migrate MySQL to DWS only.l In the migration from MySQL to DWS, the constraints on the incremental data migration
function in MySQL Binlog mode are as follows:
a. A single cluster supports only one incremental migration job in MySQL Binlogmode in the current version.
b. In the current version, you are not allowed to delete or update 10,000 data records ata time.
c. Entire database migration is not supported.d. DDL Data Definition Language (DDL) operations are not supported.e. Event migration is not supported.f. If you set Migrate Incremental Data to Yes, binlog_format in the source MySQL
database must be set to ROW.g. If you set Migrate Incremental Data to Yes and binlog file ID disorder occurs on
the source MySQL instance due to cross-machine migration or rebuilding duringincremental data migration, incremental data may be lost.
h. If a primary key exists in the destination table and incremental data is generatedduring the restart of the CDM cluster or full migration, duplicate data may exist inthe primary key. As a result, the migration fails.
i. If the destination DWS database is restarted, the migration will fail. In this case,restart the CDM cluster and the migration job.
The recommended MySQL configuration is as follows:# Enable the bin-log function.log-bin=mysql-bin# ROW modebinlog-format=ROW# gtid mode. The recommended version is 5.6.10 or later.gtid-mode=ONenforce_gtid_consistency = ON
Cloud Data MigrationUser Guide 1 Introduction
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 15
2 Getting Started
2.1 OverviewThis section describes how to use CDM to migrate the tables in the on-premises MySQLdatabase to DWS, thereby helping you get familiar with CDM. Figure 2-1 shows the specificscenario.
Figure 2-1 Migrating data from a local MySQL database to DWS
The procedure of using CDM is as follows:
1. Purchasing CDM
2. Creating Links
3. Creating and Executing a Job
4. Querying Job Execution Results
2.2 Purchasing CDM
Scenario
This section describes how to purchase CDM, that is, create a CDM cluster, to perform datamigration between an on-premises MySQL database and DWS.
Cloud Data MigrationUser Guide 2 Getting Started
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 16
Prerequisitesl Your on-premises MySQL database can be accessed using the public IP address.
l You have created a VPC.
Procedure
Step 1 Log in to the CDM management console.
Step 2 Click Buy CDM. The page for creating a CDM cluster is displayed. The following is a clusterconfiguration example:
l Current Region: Actual working area of a cluster. Currently, CN North-Beijing1, CNEast-Shanghai2, and CN South-Guangzhou are supported
l AZ: Different AZs are physically isolated but interconnected through the internalnetwork. In this example, select AZ2.
l Cluster Name: The cluster name must start with a letter and contains 4 to 64 charactersconsisting of letters, digits, hyphens (-), and underscores (_). It cannot contain specialcharacters. For example, cdm-aff1.
l Version: Retain the default value.
l Instance Type: Select an instance flavor as required, for example, select cdm.medium,which can be used in most migration scenarios.
– cdm.small: 2 vCPUs with 4 GB memory, applicable to Proof of Concept (PoC)verification and development tests
– cdm.medium: 4 vCPUs with 8 GB memory, applicable to migration of a singledatabase table with fewer than 10 million pieces of data
– cdm.large: 8 vCPUs with 16 GB memory, applicable to migration of a singledatabase table with 10 million pieces of data or more
– cdm.xlarge: 16 vCPUs with 32 GB memory, applicable to TB-level data migrationrequiring 10GE high-speed bandwidth
l VPC: Select the VPC where DWS resides.
l Subnet: You are advised to use the same subnet as that of DWS.
l Security Group: You are advised to use the security group as that of DWS.
You can select a subnet and security group that are different from those of DWS. In thiscase, configure the security group rules to allow the CDM cluster to properly accessDWS.
l Retain the default values of other parameters.
Step 3 Check the current configuration and click Buy Now to go to the page for confirming theorder.
NOTE
You cannot modify the flavor of an existing cluster. If you require a higher flavor, create a cluster.
Step 4 Click Submit. The system starts to create a CDM cluster. You can view the creation progresson the Cluster Management page.
----End
Cloud Data MigrationUser Guide 2 Getting Started
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 17
2.3 Creating Links
Description
Before migrating the local MySQL database to DWS, create two links:
1. MySQL link: used to connect to the on-premises MySQL database.2. DWS link: used to connect to the DWS database.
CDM needs to access the on-premises data source. Therefore, before creating a link, bind anEIP to the CDM cluster.
Prerequisitesl You have sufficient EIP quota. If the quota is insufficient, apply for a higher quota. For
details about how to apply for EIPs, see the Virtual Private Cloud User Guide.l You have obtained the IP address, port number, database name, username, and password
for connecting to the MySQL database. In addition, the user must have the read, write,and delete permissions on the MySQL database.
l You have purchased the DWS instance and obtained the IP address, port number,database name, username, and password for connecting to DWS. Additionally, the usermust have the read, write, and delete permissions on the DWS database.
Creating a MySQL Link
Step 1 Log in to the CDM management console.
Step 2 In the left navigation pane, click Cluster Management. Locate the cdm-aff1 cluster createdin Purchasing CDM.
Step 3 In the Operation column, click Bind Elastic IP, and select and bind an EIP to the cluster.
Step 4 Click Job Management in the Operation column of the CDM cluster. On the page that isdisplayed, choose Link Management > Create Link. The page for selecting a connector isdisplayed. See Figure 2-2.
Figure 2-2 Selecting a connector
Cloud Data MigrationUser Guide 2 Getting Started
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 18
Step 5 Select MySQL and click Next. On the page that is displayed, configure MySQL linkparameters, as shown in Figure 2-3.
Figure 2-3 Creating a MySQL link
Click Show Advanced Attributes to display optional parameters. For details, see Link toRelational Databases. Retain the default values of the optional parameters and configure themandatory parameters according to Table 2-1.
Table 2-1 MySQL link parameters
Parameter Description Example Value
Name Unique link name mysqllink
Cloud Data MigrationUser Guide 2 Getting Started
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 19
Parameter Description Example Value
Database Server IP address or domain name of theMySQL database server
192.168.0.1
Port MySQL database port 3306
Database Name Name of the MySQL database sqoop
Username User who has the read, write, and deletepermissions on the MySQL database
admin
Password Password of the user -
Step 6 Click Save. The Link Management page is displayed.
NOTE
If an error occurs during the saving, the security settings of the MySQL database are incorrect. In thiscase, you need to enable the EIP of the CDM cluster to access the MySQL database.
----End
Creating a DWS Link
Step 1 On the Link Management tab page, click Create Link and select Data Warehouse Serviceto create a DWS link.
Step 2 Click Next. The page for configuring the DWS link parameters is displayed. Configure themandatory parameters according to Table 2-2 and retain the default values of the optionalparameters.
Table 2-2 DWS link parameters
Parameter Description Example Value
Name Unique link name dwslink
Database Server IP address or domain name of the DWSdatabase server
192.168.0.3
Port DWS database port 8000
Database Name Name of the DWS database db_demo
Username User who has the read, write, and deletepermissions on the DWS database
dbadmin
Password Password of the user -
Cloud Data MigrationUser Guide 2 Getting Started
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 20
Parameter Description Example Value
Import Mode When creating a DWS link, select thedata import mode.l Copy: Migrate the source data to the
DWS management node and thencopy the data to DataNodes. Toaccess DWS through the Internet,select Copy.
l GDS: DataNodes of DWSconcurrently request data from theGDS component of CDM and thenwrite data to DWS. The GDS modecannot be used for data export fromDWS.
Theoretically, the GDS mode is moreefficient than the Copy mode. However,when the GDS mode is used, thefollowing configurations are required:1. Configure DWS to allow users of the
DWS link to create and deleteforeign tables.
2. Configure the security group wherethe CDM cluster resides to allow theDWS DataNodes to access port25000 of the internal IP address ofthe CDM cluster.
Copy
Step 3 Click Save. The link is successfully created.
----End
2.4 Creating and Executing a Job
Scenario
This section describes how to create a table migration job to migrate data tables from an on-premises MySQL database to DWS.
Procedure
Step 1 On the Cluster Management page, locate the cdm-aff1 cluster created in Purchasing CDM.
Step 2 Click Jobs Management in the Operation column of the CDM cluster.
Step 3 Choose Table/File Migration > Create Job, and configure the required job information. SeeFigure 2-4.
Cloud Data MigrationUser Guide 2 Getting Started
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 21
Figure 2-4 Creating a job
l Job Name: Enter a unique job name, for example, mysql2dws.l Source Job Configuration
– Source Link Name: Select the mysqllink link created in Creating Links.– Schema/Tablespace: Select the MySQL database from which data is to be
exported.– Table Name: Select the table from which data is to be exported.– Retain the default values of the optional parameters in Show Advanced Attributes.
For details, see From a Relational Database.l Destination Job Configuration
– Destination Link Name: Select the dwslink link created in Creating Links.– Schema/Tablespace: Select the database to which data is to be imported.– Auto Table Creation: Select Auto creation. If the table specified by Table Name
does not exist, CDM automatically creates the table in the DWS database.– Table Name: Select the table to which data is to be imported.– Retain the default values of other optional parameters. For details, see To a
Relational Database.
Step 4 Click Next. The Map Field page is displayed. See Figure 2-5. CDM automatically mapstable fields at the migration source and destination. Check whether the field mapping iscorrect.l If the field mapping is incorrect, click the row where the field is located and drag the
field to adjust the mapping.l You need to manually select the distribution columns of DWS. You are advised to select
the distribution columns according to the following principles:
a. Use the primary key as the distribution column.
Cloud Data MigrationUser Guide 2 Getting Started
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 22
b. If multiple data segments are combined as primary keys, specify all primary keys asthe distribution column.
c. In the scenario where no primary key is available, if no distribution column isselected, DWS uses the first column as the distribution column by default. As aresult, data skew risks exist.
l If you need to convert the content of the source fields, perform the operations describedin Field Conversion During Migration. In this example, the field conversion is notrequired.
Figure 2-5 Field mapping
Step 5 Click Next to set task parameters. Generally, retain the default values of all parameters.
In this step, you can configure the following optional functions:
l Retry upon Failure: If the job fails to be executed, you can determine whether toautomatically retry. Retain the default value Never.
l Schedule Execution: To configure scheduled jobs, see Scheduling Job Execution.Retain the default value No.
l Concurrent Extractors: Enter the number of extractors to be concurrently executed.Retain the default value 1.
l Write Dirty Data: Specify this parameter if data that fails to be processed or filtered outduring job execution needs to be written to OBS for future viewing. Before writing dirtydata, create an OBS link. Retain the default value No so that dirty data is not recorded.
l Delete Job After Completion: Retain the default value Do not delete.
Step 6 Click Save and Run. CDM starts to execute the job immediately.
NOTE
If the job fails to be executed, the following error message is displayed: SQL statements cannot beexecuted. ERROR: value too long for type character varying (7) Where: COPY dws_city, line 1, columnname: 'Chinese characters',
Cause: The length of the character field in the DWS table is insufficient. The encoding methods forChinese characters stored in MySQL and DWS are different, and the required lengths are different aswell. A Chinese character may occupy three bytes in UTF-8 encoding.
Solution: When creating a job in Step 3, enable automatic table creation. Set the Extend Field Lengthadvanced attribute to Yes, and then execute the job again. In this way, when CDM automatically createsa table in DWS, the length of the character fields is set to three times that of the original table.
----End
Cloud Data MigrationUser Guide 2 Getting Started
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 23
2.5 Querying Job Execution Results
ScenarioThis section describes how to view the job's execution results and the historical information inthe latest 90 days, including the number of written rows, read rows, written bytes, writtenfiles, and log information.
Procedure
Step 1 On the Cluster Management page, locate the cdm-aff1 cluster created in Purchasing CDM.
Step 2 Click Jobs Management in the Operation column of the CDM cluster.
Step 3 Locate the mysql_dws job created in Creating and Executing a Job and view the runningstatus of the job.
Step 4 Click Historical Record in the Operation column of the job. See Figure 2-6.
On the page that is displayed, you can view the number of written rows, read rows, writtenbytes, and written files.
Figure 2-6 Viewing historical records
Step 5 Click Log to view the job execution logs. See Figure 2-7.
Figure 2-7 Viewing job logs
----End
Cloud Data MigrationUser Guide 2 Getting Started
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 24
3 Cluster Management
3.1 Creating a Cluster
Scenario
Currently, CDM uses an independent cluster to provide secure and reliable data migrationservices. Clusters are isolated from each other and cannot be accessed mutually. A CDMcluster is created when you purchase CDM.
Currently, one cluster supports only one server and automatic capacity expansion is inplanning.
The network bandwidth for CDM clusters of all flavors is 1 Gbit/s. Currently, a server canmigrate 1 TB to 8 TB data every day. If a larger amount of data needs to be migrated or themigration speed needs to be accelerated, you can create multiple CDM clusters and multiplemigration jobs.
Prerequisitesl You have sufficient EIP quota if the data source is a local one. For details about how to
apply for EIPs, see the Virtual Private Cloud User Guide. The CDM cluster uses thepublic IP address to access the local data source.
l You have applied for a VPC, subnet, and security group. If the CDM cluster tries toconnect to another cloud service, ensure that the cluster and the cloud service are in thesame VPC. Otherwise, EIPs are required.
NOTE
If VPC peering connection is configured, the peer VPC subnet may overlap with the CDMmanagement network. As a result, data sources in the peer VPC cannot be accessed. You areadvised to use the public network for cross-VPC data migration, or contact the customer servicepersonnel to add specific routes to the VPC peering connection in the CDM background.
Procedure
Step 1 Log in to the CDM management console.
Step 2 Click Buy CDM. The page for creating a CDM cluster is displayed. See Figure 3-1.
Cloud Data MigrationUser Guide 3 Cluster Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 25
Figure 3-1 Creating a cluster
Step 3 Create a CDM cluster. Table 3-1 describes the required parameters.
Table 3-1 Parameter description
Parameter Example Value Description
CurrentRegion
CN North-Beijng1
Actual working area of a cluster. Currently, CNNorth-Beijing1, CN East-Shanghai2, and CN South-Guangzhou are supported.
AZ AZ1 Physical region where resources use independentpower supply and networks. Different AZs arephysically isolated but interconnected through theinternal network.
Cluster Name cdm-aff1 Custom CDM cluster name
Version 1.5.0 CDM version. Retain the default value.
Cloud Data MigrationUser Guide 3 Cluster Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 26
Parameter Example Value Description
Instance Type cdm.medium Currently, the following flavors are available:l cdm.small: 2 vCPUs with 4 GB memory,
applicable to Proof of Concept (PoC) verificationand development tests
l cdm.medium: 4 vCPUs with 8 GB memory,applicable to migration of a single database tablewith fewer than 10 million pieces of data
l cdm.large: 8 vCPUs with 16 GB memory,applicable to migration of a single database tablewith 10 million pieces of data or more
l cdm.xlarge: 16 vCPUs with 32 GB memory,applicable to TB-level data migration requiring10GE high-speed bandwidth
VPC vpc1 VPC, subnet, and security group where the CDMcluster resides, which are used to communicate withthe desired data source. They can be selectedaccording to residing networks of the migrationsource and destination.l If the CDM cluster and the data source to be
connected belong to different VPCs or the datasource is an on-premises one, the CDM clusterneeds to be bound with an elastic IP address(EIP).
l If the data source is a cloud service, you areadvised to configure the network of the CDMcluster to be the same as that of the cloud serviceand the CDM cluster does not need to be boundwith an EIP.
l If the data source is a cloud service, and CDM andthe cloud service are in the same VPC but indifferent subnets, configure security group rules tointerconnect the CDM cluster with the cloudservice.
For more information, see the Virtual Private CloudUser Guide.
Subnet subnet-1
SecurityGroup
sg-1
AutoShutdown
No After Auto Shutdown is enabled, if no job is runningin the cluster and no scheduled job is created, acluster will automatically shut down 15 minutes laterto reduce costs.After a cluster is created, if you want to modifyautomatic shutdown or scheduled startup andshutdown, click the cluster name in the cluster listand click the Cluster Configuration tab to modifythe configuration.
Cloud Data MigrationUser Guide 3 Cluster Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 27
Parameter Example Value Description
ScheduledStartup
No The CDM cluster supports scheduled startup. If thisparameter is enabled, set the scheduled startup timeevery day.
ScheduledShutdown
No During scheduled shutdown, the system does not waitfor the completion of unfinished jobs.
Step 4 Check the current configuration and click Buy Now to go to the page for confirming theorder.
NOTE
You cannot modify the flavor of an existing cluster. If you require a higher flavor, create a cluster.
Step 5 Click Submit. The system starts to create a CDM cluster. You can view the creation progresson the Cluster Management page.
----End
3.2 Binding or Unbinding an EIP
ScenarioBind an EIP to or unbind an EIP from a CDM cluster. If CDM needs to access the local orInternet data source, bind an EIP to the CDM cluster or use the NAT gateway to enable theCDM cluster to share the EIP with ECSs to access the Internet. For details, see What Is theMost Economical Way to Migrate Data from the Public Network Using CDM.
The EIPs you use are billed by the VPC service. The default EIP bandwidth is 5 Mbit/s. Toadjust the EIP bandwidth, log in to the VPC console, select Elastic IP. In the Operationcolumn, choose More > Modify Bandwidth.
Prerequisitesl You have created a CDM cluster.l You have sufficient EIP quota. If the quota is insufficient, apply for a higher quota. For
details about how to apply for EIPs, see the Virtual Private Cloud User Guide.
Procedure
Step 1 Log in to the CDM management console.
Step 2 In the left navigation pane, click Cluster Management. The Cluster Management page isdisplayed.l Binding an EIP: In the Operation column, click Bind Elastic IP, as shown in Figure
3-2. The Bind Elastic IP dialog box is displayed.
Cloud Data MigrationUser Guide 3 Cluster Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 28
Figure 3-2 Binding an EIP
l Unbinding an EIP: In the Operation column, choose More > Unbind Elastic IP.
Step 3 Click OK.
----End
3.3 Restarting a Cluster
ScenarioIf a service exception occurs, restart the service process or the VMs in the cluster.
PrerequisitesThe target cluster is running properly and no services will be interrupted if the cluster isrestarted.
Procedure
Step 1 Log in to the CDM management console.
Step 2 In the left navigation pane, click Cluster Management. The Cluster Management page isdisplayed.
Step 3 In the row of the target cluster, click Restart
Step 4 Select the restart method, as shown in Figure 3-3.l Graceful: Only the CDM service process is restarted. The cluster VM will not be
restarted.l Restart cluster VM: The service process will be interrupted and VMs in the cluster will
be restarted.
Cloud Data MigrationUser Guide 3 Cluster Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 29
Figure 3-3 Restarting a cluster
Step 5 Click OK.
----End
3.4 Stopping, Starting, or Deleting a Cluster
Scenario
When creating a CDM cluster, you can set the automatic shutdown or scheduled startup andshutdown function for the cluster. After the cluster is created, click the name of a cluster onthe Cluster Management page and click the Cluster Configuration tab to modify automaticshutdown or scheduled startup and shutdown.
You can also manually shut down or delete clusters to reduce costs.
NOTE
Before the deletion, you can use the batch export function by referring to Batch Managing Jobs to saveall job JSON files to a local PC. Then, you can create a cluster and import the jobs again whennecessary.
Cloud Data MigrationUser Guide 3 Cluster Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 30
Prerequisites
The target cluster is running properly and no services will be interrupted if the cluster isdeleted.
Procedure
Step 1 Log in to the CDM management console.
Step 2 In the left navigation pane, click Cluster Management. The Cluster Management page isdisplayed.
Step 3 In the Operation column, click More and select Start, Delete, or Stop to start, delete, or stopa cluster.
Step 4 Click the name of a cluster and click the Cluster Configuration tab to modify automaticshutdown or scheduled startup and shutdown.
Figure 3-4 Modifying cluster configuration
Step 5 Click Save.
----End
3.5 Viewing Cluster Configurations, Logs, and MonitoringData
Scenario
View cluster configurations, obtain cluster logs, and view monitoring data on Cloud Eye.
Prerequisites
You have created a CDM cluster.
Procedure
Step 1 Log in to the CDM management console.
Cloud Data MigrationUser Guide 3 Cluster Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 31
Step 2 In the left navigation pane, click Cluster Management to display the cluster list. See Figure3-5.
Figure 3-5 Cluster list
Step 3 Click in front of the cluster name to view the configurations of the cluster, including thecluster flavor, creation time, node quantity, node configurations, network configurations,project ID, cluster ID, and instance ID.
Figure 3-6 Viewing cluster configurations
Step 4 In the row of the cluster, choose More > Download Log to obtain cluster logs.
Step 5 In the row of the cluster, choose More > View Monitoring Data. The Cloud Eyemanagement console is displayed, on which you can view the inbound and outbound rates,and CPU and memory usages. For details about the monitoring metrics, see Monitoring.
----End
3.6 MonitoringMonitoring is the key to ensure CDM cluster performance, reliability, and availability. Usingmonitored data, you can determine CDM cluster resource usage. Cloud Eye on HUAWEICLOUD helps you better understand the running status of your CDM clusters. You can useCloud Eye to automatically monitor CDM clusters in real time and manage alarms andnotifications, so that you can keep track of CDM cluster performance metrics.
This section describes the following:
l CDM Metrics
l Configuring Alarm Rules
l Viewing CDM Metrics
Cloud Data MigrationUser Guide 3 Cluster Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 32
3.6.1 CDM MetricsTable 3-2 lists the CDM metrics.
Table 3-2 CDM performance metrics
Metric Description ValueRange
MonitoredObject
Bytes In Measures the network inbound rate ofthe monitored object.Unit: byte/s
≥ 0 bytes/s Cloud DataMigration
Bytes Out Measures the network outbound rateof the monitored object.Unit: byte/s
≥ 0 bytes/s Cloud DataMigration
CPU Usage Measures the CPU usage of themonitored object.Unit: %
0% to 100% Cloud DataMigration
Memory Usage Measures the memory usage of themonitored object.Unit: %
0% to 100% Cloud DataMigration
3.6.2 Configuring Alarm Rules
Scenario
Set the alarm rules to customize the monitored objects and notification policies. Then, learnCDM running status in a timely manner.
A CDM alarm rule includes the alarm rule name, monitored object, metric, threshold,monitoring interval, and whether to send a notification. This section describes how to setCDM alarm rules.
Procedure1. Log in to the CDM management console.2. Choose Cluster Management. Choose More > View Monitoring Data. The Cloud Eye
management console is displayed.3. In the left navigation pane, choose Alarm Management > Alarm Rules.4. On the Alarm Rules page, click Create Alarm Rule to create an alarm rule, or modify
an existing alarm rule.The following operations use the modification of an existing alarm rule as an example.
a. Click the name of the target alarm rule.b. Click Modify in the upper right corner of the page.c. On the Modify Alarm Rule page shown in Figure 3-7, set parameters as prompted.
Cloud Data MigrationUser Guide 3 Cluster Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 33
Figure 3-7 Modifying an alarm rule
d. Click OK. After the alarm rule is set, the system automatically notifies you when analarm is triggered.
NOTE
For more information about CDM alarm rules, see the Cloud Eye User Guide.
3.6.3 Querying Metrics
Scenario
Cloud Eye on HUAWEI CLOUD monitors CDM cluster running statuses. You can obtain themonitoring metrics of CDM on the Cloud Eye management console.
Monitored data requires a period of time for transmission and display. The status of CDMdisplayed on the Cloud Eye page is the status obtained 5 to 10 minutes before. You can viewthe monitored data of a newly created CDM cluster 5 to 10 minutes later.
Cloud Data MigrationUser Guide 3 Cluster Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 34
Prerequisitesl The CDM cluster is running properly.
If a cluster fails to shut down or restart, or is unavailable, its monitoring metrics cannotbe viewed on Cloud Eye. You can view the monitored data only after the cluster isrestarted or recovered.
l Alarm rules have been configured on the Cloud Eye page. For details, see ConfiguringAlarm Rules.
l The cluster has been properly running for about 10 minutes.The monitored data and graphs are available for a newly created cluster after the clusterruns for at least 10 minutes.
Procedure1. Log in to the CDM management console.2. Choose Cluster Management. Choose More > View Monitoring Data. The Cloud Eye
management console is displayed.3. On the CDM monitoring page, you can view the graphs of all monitoring metrics, as
shown in Figure 3-8.
Figure 3-8 Viewing metrics
Cloud Data MigrationUser Guide 3 Cluster Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 35
4. Click in the upper right corner of the graphs to zoom out the graphs, as shown inFigure 3-9.The system allows you to select a fixed time range or customize the time range.
a. Fixed time ranges include Last 1 hour, Last 3 hours, Last 12 hours, Last 24hours, Last 7 days, and Last 30 days.
b. A customized time range can be specified within the latest seven days.
Figure 3-9 Zoomed out monitoring graph
3.7 CTS
3.7.1 Key CDM Operations Recorded by CTS
Scenario
CTS provides records of operations on cloud service resources. With CTS, you can query,audit, and backtrack these operations.
Prerequisites
CTS has been enabled.
Key CDM operations recorded by CTS
Table 3-3 CDM operations recorded by CTS
Operation Resource Type Trace Name
Creating a cluster cluster createCluster
Cloud Data MigrationUser Guide 3 Cluster Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 36
Operation Resource Type Trace Name
Deleting a cluster cluster deleteCluster
Modifying clusterconfiguration
cluster modifyCluster
Starting a cluster cluster startCluster
Stopping a cluster cluster stopCluster
Restarting a cluster cluster restartCluster
Importing a job cluster clusterImportJob
Binding an EIP cluster bindEip
Unbinding an EIP cluster unbindEip
Creating a link link createLink
Modifying a link link modifyLink
Deleting a link link deleteLink
Creating a job job createJob
Modifying a job job modifyJob
Deleting a job job deleteJob
Starting a job job startJob
Stopping a job job stopJob
3.7.2 Viewing Traces
Scenario
After you enable CTS, the system starts to record the CDM operations. The managementconsole of CTS stores the traces of the latest seven days.
This section describes how to query these traces.
Procedure1. Log in to the management console.
2. Click in the upper left corner and select the desired region and project.
3. Click Service List, and choose Management & Deployment > Cloud Trace Service.
4. In the left navigation pane, click Trace List.
5. Click Filter and specify filters as required. Figure 3-10 shows the recorded CDM traces.
Cloud Data MigrationUser Guide 3 Cluster Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 37
Figure 3-10 CDM traces
6. Locate a trace and click to unfold the trace details.7. Locate a trace and click View Trace in the Operation column.
For more information about CTS, see the Cloud Trace Service User Guide.
Cloud Data MigrationUser Guide 3 Cluster Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 38
4 Link Management
4.1 Creating a Link
ScenarioBefore creating a data migration task, create a link to enable the CDM cluster to read datafrom and write data to the data source. The same link can be used as the link for CDM toexport data (source link) or import data (destination link).
The link configurations vary with the data source type. This section describes how to create alink based on the data source type.
Prerequisitesl You have created a CDM cluster.l The CDM cluster can communicate with the data source. To connect the internal network
to the HUAWEI CLOUD network, see How Do I Connect On-Premises Intranet orThird-Party Private Network to CDM?.
l You have obtained the URL and the account for accessing the data source. The accountis granted with the read and write permission on the data source.
Procedure
Step 1 Log in to the CDM management console.
Step 2 In the left navigation pane, click Cluster Management. Locate the target cluster, choose JobManagement > Link Management > Create Link, and select a connector. See Figure 4-1.
NOTE
The connectors are classified based on the type of the data source to be connected. All types of datasources that support data import or export using CDM are displayed.
Cloud Data MigrationUser Guide 4 Link Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 39
Figure 4-1 Selecting a connector
Step 3 Select a data source and click Next.
On the page that is displayed, configure the required parameters based on Table 4-1.
Table 4-1 Link parameters
Connector Description
l Data Warehouse Servicel RDS (MySQL)l RDS (PostgreSQL)l RDS (SQL Server)l DDMl MySQLl PostgreSQLl Microsoft SQL Serverl Oraclel IBM Db2l FusionInsight LibrAl Derecho (GaussDB)
Because the JDBC drivers used to connect to theserelational databases are the same, the parameters tobe configured are also the same and are described inLink to Relational Databases.l When importing data to DWS, specify the Copy
or GDS import mode to improve the importperformance. You can specify the Import Modeparameter when creating a DWS link.
l When importing data to RDS for MySQL, enablethe LOAD DATA function of MySQL toaccelerate data import and improve the importperformance. You can configure Use Local APIto enable the function when you create a MySQLlink.
HUAWEI CLOUD OBS If the data source is OBS on HUAWEI CLOUD, seeLink to OBS.
Alibaba Cloud OSS If the data source is OSS on Alibaba Cloud, see Linkto OSS on Alibaba Cloud.Currently, data can only be exported from OSS toOBS.
Qiniu Cloud Object Storage If the data source is Qiniu Cloud Object Storage(KODO), see Link to Qiniu Cloud Object Storage.Currently, data can only be exported from QiniuCloud Object Storage to OBS.
Cloud Data MigrationUser Guide 4 Link Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 40
Connector Description
l MRS HDFSl FusionInsight HDFSl Apache HDFS
If the data source is HDFS of MRS, Apache Hadoop,or FusionInsight HD, see Link to HDFS.NOTE
If Running Mode is set to Standalone, CDM can migratedata between HDFSs of multiple MRS clusters.
l MRS HBasel FusionInsight HBasel Apache HBase
If the data source is HBase of MRS, Apache Hadoop,or FusionInsight HD, see Link to HBase.
MRS Hive If the data source is Hive of MRS, see Link to Hive.
CloudTable Service If the data source is CloudTable, see Link toCloudTable.
l FTPl SFTP
If the data source is an FTP or SFTP server, see Linkto an FTP or SFTP Server.
l HTTPl HTTPS
These connectors are used to read files with anHTTP/HTTPS URL, such as reading public files onthe third-party object storage system and web disks.When creating an HTTP link, you only need toconfigure the link name. The URL is configuredduring job creation.
Network Attached Storage If the data source is a local NAS server, see Link to aNAS Server.CIFS and SMB are supported. CDM can connect todedicated file servers, Windows file sharing servers,Linux Samba servers, and cloud services that provideCIFS/SMB file systems.
l MongoDBl Document Database Service
If the data source is a local MongoDB or DDS, seeLink to MongoDB/DDS.Currently, data can be exported from but cannot beimported to MongoDB or DDS.
l Redisl Distributed Cache Service
If the data source is a local Redis database or DCS,see Link to Redis/DCS.Currently, data can be imported to but cannot beexported from DCS. Data can be imported to andexported from the open source Redis.
Apache Kafka If the data source is the open source Kafka, see Linkto Kafka.Currently, data can only be exported from Kafka toCloud Search Service.
Data Ingestion Service If the data source is DIS, see Link to DIS.Currently, data can only be exported from DIS toCloud Search Service.
Cloud Data MigrationUser Guide 4 Link Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 41
Connector Description
l Cloud Search Servicel Elasticsearch
If the data source is Cloud Search Service orElasticsearch, see Link to Elasticsearch.
Data Lake Insight If the data source is DLI, see Link to DLI.Currently, data can be imported to but cannot beexported from DLI.
Step 4 After configuring the parameters of the link, click Test to check whether the link is available.Alternatively, click Save. The system will automatically check whether the link is available.
If the network is poor or the data source is too large, the link test may take 30 to 60 seconds.
----End
4.2 Link Parameter Description
4.2.1 Link to Relational DatabasesBecause the JDBC drivers used by CDM to connect to relational databases are the same, theparameters to be configured are also the same. Currently, CDM supports the followingrelational databases:
l Data Warehouse Servicel RDS (MySQL)l RDS (PostgreSQL)l RDS (SQL Server)l DDMl MySQLl PostgreSQLl Microsoft SQL Serverl Oraclel IBM Db2l FusionInsight LibrAl Derecho (GaussDB)
Compatible Databases and Versions
Table 4-2 lists the relational databases that have been verified to be accessible to CDM.
l The performance of CDM has been optimized to better suit for migration to MySQL andDWS and is higher than that provided by the native JDBC API.
l The following table lists the database types and versions that have been verified to beaccessible (both read and write) to CDM. Other database versions not included in thetable may still be accessible but not tested. If the database version you use isinaccessible, contact the customer service.
Cloud Data MigrationUser Guide 4 Link Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 42
Table 4-2 Compatible databases and versions
Database Verified Version
Oracle Oracle 11g 11.2.0.3.0
MySQL MySQL 5.5.43-log
Microsoft SQLServer SQL Server 2012
IBM Db2 Db2 v9.7.0.0
PostgreSQL PostgreSQL 9.1 (x86)
Derecho (GaussDB) GaussDB V100R003C10SPC115
Link ParametersTable 4-3 describes the required parameters of the link to DWS, RDS for MySQL, RDS forPostgreSQL, RDS for SQL Server, DDM, MySQL, PostgreSQL, Microsoft SQL Server,Oracle, IBM Db2, or Derecho (GaussDB).
Table 4-3 Parameter description
Parameter Description Example Value
Name Link name, which can be defined based on thedata source type for easy memorization
mysql_link
Database Server IP address or domain name of the database to beconnectedClick Select next to the text box to obtain thelist of DWS and RDS instances.
192.168.0.1
Port Port number of the database to be connected 3306
Database Name Name of the database to be connected dbname
Username Username of the account for accessing thedatabase. This account must be able to read andwrite data tables and read metadata of thedatabase.
cdm
Password Password of the account -
Cloud Data MigrationUser Guide 4 Link Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 43
Parameter Description Example Value
Import Mode When creating a DWS link, select the dataimport mode.l Copy: Migrate the source data to the DWS
management node and then copy the data toDataNodes. To access DWS through theInternet, select Copy.
l GDS: DataNodes of DWS concurrentlyrequest data from the GDS component ofCDM and then write data to DWS. The GDSmode cannot be used for data export fromDWS.
Theoretically, the GDS mode is more efficientthan the Copy mode. However, when the GDSmode is used, the following configurations arerequired:1. Configure DWS to allow users of the DWS
link to create and delete foreign tables.2. Configure the security group where the
CDM cluster resides to allow the DWSDataNodes to access port 25000 of theinternal IP address of the CDM cluster.
For details, see GDS Import Mode.
GDS
Fetch Size (Optional) This parameter is displayed onlyafter you click Show Advanced Attributes.Number of rows obtained by each request. Setthis parameter based on the data source and thejob's data size. If the value is either too large ortoo small, the job may run for a long time.
1000
Cloud Data MigrationUser Guide 4 Link Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 44
Parameter Description Example Value
Use Local API (Optional) Whether to use the local API of thedatabase for accelerationWhen you create a MySQL link, CDMautomatically enables the local_infile systemvariable of the MySQL database to enable theLOAD DATA function, which accelerates dataimport to the MySQL database.If CDM fails to enable the function, contact thedatabase administrator to enable the local_infilesystem variable. Alternatively, set Use LocalAPI to No to disable API acceleration.If data is imported to RDS for MySQL, theLOAD DATA function is disabled by default. Insuch a case, you need to modify the parametergroup of the MySQL instance and setlocal_infile to ON to enable the LOAD DATAfunction.NOTE
If local_infile on RDS is uneditable, it is the defaultparameter group. You need to create a parametergroup, modify its values, and apply it to the RDS forMySQL instance. For details, see the RelationalDatabase Service User Guide.
Yes
SSL Encryption (Optional) If you set this parameter to Yes,CDM can connect to RDS (on-premisesdatabases excluded) in SSL encryption mode.Security hardening has been performed on RDSfor PostgreSQL. For this reason, when creatinga link to RDS for PostgreSQL, set thisparameter to Yes.
Yes
Link Properties (Optional) Click Add to add the JDBCconnector attributes of multiple specified datasources. For details, see the JDBC connectordocument of the corresponding database.
sslmode=require
Reference Sign (Optional) Delimiter between the names of thereferenced tables or columns. For details, seethe product documentation of the correspondingdatabase.
'
Oracle Version This parameter is displayed only for Oraclelinks. When error message"java.sql.SQLException: Protocol violation" isdisplayed, select another version.
12.1.0.1
Oracle SID Oracle instance ID, which is used todifferentiate databases by instances
dbname
Cloud Data MigrationUser Guide 4 Link Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 45
4.2.2 Link to OBSWhen connecting CDM to OBS, configure parameters according to Table 4-4.
Table 4-4 Parameter description
Parameter Description Example Value
Name Link name, which can be definedbased on the data source type foreasy memorization
obs_link
OBS Server IP address or domain name of theOBS server
192.168.0.1
Port Port number of the OBS server,which is 5443 by default
5443
AK AK used to log in to the OBS server HCXUET8G37MWF
SK SK used to log in to the OBS server -
4.2.3 Link to OSS on Alibaba CloudWhen connecting CDM to OSS on Alibaba Cloud, configure parameters according to Table4-5.
Table 4-5 Parameter description
Parameter Description Example Value
Name Link name, which can be defined basedon the data source type for easymemorization
oss_link
OSS Endpoint Endpoint of OSS on Alibaba Cloud oss-cn-hangzhou.aliyuncs.com
AuthenticationMethod
Available identity authentication methods:l Access key: Use the access key to
access OSS.l Temporary access credential: Use
the temporary key and security tokento access OSS.
Access key
AK AK used to log in to the OSS server 0DCPPWWA4VKXCKHIX
SK SK used to log in to the OSS server -
Cloud Data MigrationUser Guide 4 Link Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 46
Parameter Description Example Value
Security Token If you set Authentication Method toTemporary access credential, enter thetemporary token provided by SecurityToken Service (STS).
-
4.2.4 Link to Qiniu Cloud Object StorageWhen connecting CDM to Qiniu Cloud Object Storage (KODO), configure parametersaccording to Table 4-6. Currently, data can only be exported from Qiniu Cloud ObjectStorage to OBS.
Table 4-6 KODO link parameters
Parameter Description Example Value
Name Link name, which can be defined basedon the data source type for easymemorization
kodo_link
Region Region where the data center of KODO islocated
z0
AK AK used to log in to the KODO server 0DCPPWWA4VKXCKHIX
SK SK used to log in to the KODO server -
Use CustomDomain Name toDownload Objects
(Optional) Whether to preferentially usethe custom domain name to downloadobjects from the bucket if the objectstorage bucket has a CDN or othercustom domain names.
Yes
4.2.5 Link to HDFSCurrently, CDM supports the following HDFS data sources:l MRS HDFSl FusionInsight HDFSl Apache HDFS
MRS HDFSWhen connecting CDM to HDFS of MRS, configure parameters according to Table 4-7.
Cloud Data MigrationUser Guide 4 Link Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 47
Table 4-7 MRS HDFS link parameters
Parameter Description Example Value
Name Link name, which can be defined based onthe data source type for easy memorization
mrs_hdfs_link
Manager IP IP address of MRS Manager. Click Selectnext to the Manager IP text box to select acreated MRS cluster. CDM automaticallyfills in the authentication information.
127.0.0.1
AuthenticationMethod
Authentication method used for accessingMRSl Simple: Select this if MRS is in non-
security mode.l Kerberos: Select this if MRS is in
security mode.
Simple
Username When Authentication Method is set toKerberos, set the username and passwordfor logging in to MRS Manager.
cdm
Password Password for logging in to MRS Manager -
Running Mode Running mode of the HDFS link. Theoptions are as follows:l Embedded: The link instance runs with
CDM. This mode has better performance.l Standalone: The link instance runs in an
independent process. If CDM needs toconnect to multiple Hadoop data sources(MRS, Hadoop, or CloudTable) with bothKerberos and Simple authenticationmodes, Standalone prevails.
If Standalone is selected, CDM can migratedata between HDFSs of multiple MRSclusters.
Standalone
FusionInsight HDFS
When connecting CDM to HDFS of FusionInsight HD, configure parameters according toTable 4-8.
Table 4-8 FusionInsight HDFS link parameters
Parameter Description Example Value
Name Link name, which can bedefined based on the data sourcetype for easy memorization
FI_hdfs_link
Cloud Data MigrationUser Guide 4 Link Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 48
Parameter Description Example Value
Manager IP IP address of FusionInsightManager
127.0.0.1
Manager Port Port number of FusionInsightManager
28443
CAS Server Port Port number of the CAS serverused to connect to FusionInsight
20009
Username Username for logging in toFusionInsight Manager
cdm
Password Password for logging in toFusionInsight Manager
-
Authentication Method Authentication method used foraccessing FusionInsight HDl Simple: Select this if
FusionInsight HD is in non-security mode.
l Kerberos: Select this ifFusionInsight HD is insecurity mode.
Kerberos
Running Mode Running mode of the HDFSlink. The options are as follows:l Embedded: The link
instance runs with CDM.This mode has betterperformance.
l Standalone: The linkinstance runs in anindependent process. If CDMneeds to connect to multipleHadoop data sources (MRS,Hadoop, or CloudTable) withboth Kerberos and Simpleauthentication modes,Standalone prevails.
Standalone
Apache HDFSWhen connecting CDM to HDFS of Apache Hadoop, configure parameters according toTable 4-9.
Cloud Data MigrationUser Guide 4 Link Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 49
Table 4-9 Apache HDFS link parameters
Parameter Description Example Value
Name Link name, which can be defined basedon the data source type for easymemorization
hadoop_hdfs_link
URI NameNode URI hdfs://nn1.example.com/
AuthenticationMethod
Authentication method used foraccessing Hadoopl Simple: Select this if Hadoop is in
non-security mode.l Kerberos: Select this if Hadoop is
in security mode to obtain theprincipal account and the keytabfile from the client forauthentication.
Kerberos
Principal When Authentication Method is setto Kerberos, the principal account isused for authentication. You cancontact the Hadoop administrator toobtain the account.
Keytab File When Authentication Method is setto Kerberos, the keytab file is used forauthentication. You can contact theHadoop administrator to obtain the file.
/opt/user.keytab
Mapping Between IPand Host Name
If the HDFS configuration file uses thehost name, configure the mappingbetween the IP address and host name.Separate the IP addresses and hostnames by spaces and mappings bysemicolons (;) or carriage return andline feeds.
10.1.6.9 hostname0110.2.7.9 hostname02
Running Mode Running mode of the HDFS link. Theoptions are as follows:l Embedded: The link instance runs
with CDM. This mode has betterperformance.
l Standalone: The link instance runsin an independent process. If CDMneeds to connect to multipleHadoop data sources (MRS,Hadoop, or CloudTable) with bothKerberos and Simple authenticationmodes, Standalone prevails.
Standalone
Cloud Data MigrationUser Guide 4 Link Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 50
4.2.6 Link to HBaseCurrently, CDM supports the following HBase data sources:l MRS HBasel FusionInsight HBasel Apache HBase
MRS HBase
When connecting CDM to HBase of MRS, configure parameters according to Table 4-10.
Table 4-10 MRS HBase link parameters
Parameter Description Example Value
Name Link name, which can be defined basedon the data source type for easymemorization
mrs_hbase_link
Manager IP IP address of MRS Manager. ClickSelect next to the Manager IP text boxto select a created MRS cluster. CDMautomatically fills in the authenticationinformation.
127.0.0.1
AuthenticationMethod
Authentication method used foraccessing MRSl Simple: Select this if MRS is in
non-security mode.l Kerberos: Select this if MRS is in
security mode.
Simple
Username When Authentication Method is setto Kerberos, set the username andpassword for logging in to MRSManager.
admin
Password Password for logging in to MRSManager
-
Running Mode Running mode of the HBase link.l Embedded: The link instance runs
with CDM. This mode has betterperformance.
l Standalone: The link instance runsin an independent process. If CDMneeds to connect to multipleHadoop data sources (MRS,Hadoop, or CloudTable) with bothKerberos and Simple authenticationmodes, Standalone prevails.
Standalone
Cloud Data MigrationUser Guide 4 Link Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 51
FusionInsight HBaseWhen connecting CDM to HBase of FusionInsight HD, configure parameters according toTable 4-11.
Table 4-11 FusionInsight HBase link parameters
Parameter Description Example Value
Name Link name, which can be defined basedon the data source type for easymemorization
FI_hbase_link
Manager IP IP address of FusionInsight Manager 127.0.0.1
Manager Port Port number of FusionInsight Manager 28443
CAS Server Port Port number of the CAS server used toconnect to FusionInsight
20009
Username Username for logging in toFusionInsight Manager
cdm
Password Password for logging in toFusionInsight Manager
-
AuthenticationMethod
Authentication method used foraccessing FusionInsight HDl Simple: Select this if FusionInsight
HD is in non-security mode.l Kerberos: Select this if
FusionInsight HD is in securitymode.
Kerberos
Running Mode Running mode of the HBase link.l Embedded: The link instance runs
with CDM. This mode has betterperformance.
l Standalone: The link instance runsin an independent process. If CDMneeds to connect to multipleHadoop data sources (MRS,Hadoop, or CloudTable) with bothKerberos and Simple authenticationmodes, Standalone prevails.
Standalone
Apache HBaseWhen connecting CDM to HBase of Apache Hadoop, configure parameters according toTable 4-12.
Cloud Data MigrationUser Guide 4 Link Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 52
Table 4-12 Apache HBase link parameters
Parameter Description Example Value
Name Link name, which can be defined basedon the data source type for easymemorization
hadoop_hbase_link
URI NameNode URI hdfs://nn1.example.com/
AuthenticationMethod
Authentication method used foraccessing Hadoopl Simple: Select this if Hadoop is in
non-security mode.l Kerberos: Select this if Hadoop is
in security mode to obtain theprincipal account and the keytabfile from the client forauthentication.
Kerberos
Principal When Authentication Method is setto Kerberos, the principal account isused for authentication. You cancontact the Hadoop administrator toobtain the account.
Keytab File When Authentication Method is setto Kerberos, the keytab file is used forauthentication. You can contact theHadoop administrator to obtain the file.
/opt/user.keytab
Mapping Between IPand Host Name
If the configuration file uses the hostname, configure the mapping betweenthe IP address and host name. Separatethe IP addresses and host names byspaces and mappings by semicolons (;)or carriage return and line feeds.
10.3.6.9 hostname0110.4.7.9 hostname02
Running Mode Running mode of the HBase link.l Embedded: The link instance runs
with CDM. This mode has betterperformance.
l Standalone: The link instance runsin an independent process. If CDMneeds to connect to multipleHadoop data sources (MRS,Hadoop, or CloudTable) with bothKerberos and Simple authenticationmodes, Standalone prevails.
Standalone
Cloud Data MigrationUser Guide 4 Link Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 53
4.2.7 Link to HiveCurrently, CDM supports Hive of MRS. Table 4-13 describes the related parameters.
Table 4-13 Hive link parameters
Parameter Description Example Value
Name Link name, which can be defined basedon the data source type for easymemorization
hivelink
Manager IP IP address of MRS Manager. ClickSelect next to the Manager IP text boxto select a created MRS cluster. CDMautomatically fills in the authenticationinformation.
127.0.0.1
AuthenticationMethod
Authentication method used foraccessing MRSl Simple: Select this if MRS is in
non-security mode.l Kerberos: Select this if MRS is in
security mode.
Simple
Username When Authentication Method is setto Kerberos, set the username andpassword for logging in to MRSManager.
cdm
Password Password for logging in to MRSManager
-
4.2.8 Link to CloudTableWhen connecting CDM to CloudTable, configure parameters according to Table 4-14.
Table 4-14 Parameter description
Parameter Description Example Value
Name Link name, which can bedefined based on the data sourcetype for easy memorization
cloudtable_link
ZK Link Obtain this parameter valuefrom the cluster managementpage of CloudTable.
cloudtable-cdm-zk1.cloudtable.com:2181,cloudtable-cdm-zk2.cloudtable.com:2181
Cloud Data MigrationUser Guide 4 Link Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 54
4.2.9 Link to an FTP or SFTP ServerWhen connecting CDM to an FTP or SFTP server, configure parameters according to Table4-15.
Table 4-15 Parameter description
Parameter Description Example Value
Name Link name, which can be definedbased on the data source type foreasy memorization
ftp_link
Host Name/IP Address Host name or IP address of the FTPor SFTP server
ftp.apache.org
Port Port number of the FTP or SFTPserver, which is 21 by default
21
Username Username for logging in to the FTPor SFTP server
cdm
Password Password for logging in to the FTPor SFTP server
-
4.2.10 Link to a NAS ServerWhen connecting CDM to a NAS server, configure parameters according to Table 4-16.
CIFS and SMB are supported. CDM can connect to dedicated file servers, Windows filesharing servers, Linux Samba servers, and cloud services that provide CIFS/SMB filesystems.
Table 4-16 Parameter description
Parameter Description Example Value
Name Link name, which can be defined based on the datasource type for easy memorization
nas_link
Protocol NAS file protocol. Currently, only SMB and CIFSare supported.
SMB
Shared Path Shared path of the NAS server \\server\share
Username Username for logging in to the NAS server, whichis in the domain name\username format
CHINA\user
Password Password for logging in to the NAS server -
Cloud Data MigrationUser Guide 4 Link Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 55
4.2.11 Link to MongoDB/DDSWhen connecting CDM to an on-premises MongoDB database or DDS, configure parametersaccording to Table 4-17.
Table 4-17 Parameter description
Parameter Description Example Value
Name Link name, which can be defined basedon the data source type for easymemorization
mongodb_link
MongoDB ServerList
IP address list of servers. Enter each IPaddress in the IP address or domain nameof the database server:port number formatand separate the entered IP addresses withsemicolons (;).
192.168.0.1:7300;192.168.0.2:7301
Database Name Name of the MongoDB database to beconnected
DB_mongodb
Username Username for logging in to MongoDB cdm
Password Password for logging in to MongoDB -
4.2.12 Link to Redis/DCSWhen connecting CDM to an on-premises Redis database or DCS, configure parametersaccording to Table 4-18.
Table 4-18 Parameter description
Parameter Description Example Value
Name Link name, which can bedefined based on the data sourcetype for easy memorization
redis_link
Redis Deployment Method Two deployment methods areavailable:l Single: installation on a
single-node systeml Cluster: installation on a
cluster
Single
Redis Server List IP address list of servers. Entereach IP address in the IPaddress or domain name of thedatabase server:port numberformat and separate the enteredIP addresses with semicolons (;).
192.168.0.1:7300;192.168.0.2:7301
Cloud Data MigrationUser Guide 4 Link Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 56
Parameter Description Example Value
Password Password for logging in to Redis -
Redis Database Index Redis database index, which issimilar to the name of arelational database
0
4.2.13 Link to KafkaWhen connecting CDM to Kafka of local Apache Hadoop, configure parameters according toTable 4-19.
Table 4-19 Parameter description
Parameter Description Example Value
Name Link name, which can be definedbased on the data source type foreasy memorization
kafka_link
Kafka broker IP address and port number of theKafka broker
192.168.1.1:9092
4.2.14 Link to DISWhen connecting CDM to DIS, configure parameters according to Table 4-20. Currently, datacan only be exported from DIS to Cloud Search Service.
Table 4-20 Parameter description
Parameter Description Example Value
Name Link name, which can be definedbased on the data source type for easymemorization
dis_link
Region DIS partition cn-north-1
Endpoint DIS endpoint to be linked https://dis.cn-north-1.myhuaweicloud.com:20004
AK AK used to log in to the DIS server 0DCPPWWA4VKXCWYWKHIX
SK SK used to log in to the DIS server -
Project ID Project ID of DIS c48475ce8e174a7a9f775706a3d5eb2
Cloud Data MigrationUser Guide 4 Link Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 57
4.2.15 Link to ElasticsearchWhen connecting CDM to Cloud Search Service or Elasticsearch, configure parametersaccording to Table 4-21.
Table 4-21 Parameter description
Parameter Description Example Value
Name Link name, which can be defined basedon the data source type for easymemorization
css_link
ElasticsearchServer
IP address or domain name of theElasticsearch server
192.168.0.1
Port Port number of the Elasticsearch server 9200
Username (Optional) Username for logging in to thedatabase to be connected
cdm
Password (Optional) Password for logging in to thedatabase to be connected
-
4.2.16 Link to DLIWhen connecting CDM to DLI, configure parameters according to Table 4-22.
Table 4-22 Parameter description
Parameter Description Example Value
Name Link name, which can be defined basedon the data source type for easymemorization
dli_link
AK AK required for authentication duringaccess to the DLI database
GRC2WR0IDC6NGROYLWU2
SK SK required for authentication duringaccess to the DLI database
-
Project ID Project ID in the region where DLIresides
a46ed0f02bde42e7afe36777eb9d0f42
4.3 Editing/Deleting a Link
Scenario
CDM allows you to perform the following operations on created links:
Cloud Data MigrationUser Guide 4 Link Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 58
l Edit: You can modify link parameters, but cannot re-select connectors. To modify a link,you need to re-enter the password for accessing the data source.
l Test Connectivity: You can directly test the connectivity of a created link.l View Link JSON: Check the link parameter settings in JSON format.l Edit Link JSON: Edit the link parameter settings in JSON format.l Delete: You can delete links that are not used by any jobs in batches.
Prerequisitesl You have obtained the username and password for accessing the desired data source.l Links are not used by any jobs.
Procedure
Step 1 Log in to the CDM management console.
Step 2 In the left navigation pane, click Cluster Management. Locate the target cluster and chooseJob Management > Link Management.
Step 3 On the Link Management page, locate the links to be deleted.l Edit: Click the link name or click Edit in the Operation column to access the page for
modifying the link. When modifying a link, you need to enter the password for loggingin to the data source again. For details about the parameters, see Link ParameterDescription.
l Test Connectivity: Click Test Connectivity in the Operation column to test theconnectivity of the created link.
l View Link JSON: In the Operation column, choose More > View Link JSON to viewlink parameters in JSON format.
l Edit Link JSON: In the Operation column, choose More > Edit Link JSON to modifylink parameters in JSON format.
l Delete: Select multiple links, and click Delete Link next to Create Link to batch deleteunused links.
----End
Cloud Data MigrationUser Guide 4 Link Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 59
5 Job Management
5.1 Creating a Job
5.1.1 Table/File Migration
ScenarioCDM can migrate tables or files between homogeneous and heterogeneous data sources. Fordetails about data sources that support table/file migration, see Data Sources Supported byCDM.
It is applicable to data migration to the cloud, data exchange on the cloud, and data migrationto on-premises service systems.
Prerequisitesl You have created a link by referring to Creating a Link.l The CDM cluster can communicate with the data source.
Procedure
Step 1 Log in to the CDM management console.
Step 2 In the left navigation pane, click Cluster Management. Locate the target cluster and clickJob Management.
Step 3 Choose Table/File Migration > Create Job. The page for configuring the job is displayed.See Figure 5-1.
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 60
Figure 5-1 Creating a migration job
Step 4 Configure parameters for the job as follows:l Job Name: Enter a custom job name, which is a string of 1 to 256 characters chosen
from letters, underscores (_), and digits, for example, oracle2obs_t.l Source Link Name: Select the data source from which data is to be exported.l Destination Link Name: Select the data source to which data is to be imported.
If no link is available, click + or go to the Link Management page to create one. For detailsabout how to create a link, see Creating a Link.
Step 5 After selecting the source link, configure the source job parameters. The parameters vary withdata source types. For details, see Table 5-1.
Table 5-1 Source link parameter description
Migration Source Description Parameter Settings
l OBSl Alibaba Cloud
OSSl Qiniu Cloud
Object Storage
Data can be extracted inCSV, JSON, CarbonData, orbinary format. Data extractedin binary format is free fromfile resolution, which ensureshigh performance and ismore applicable to filemigration.Currently, data cannot beimported to Alibaba CloudOSS and Qiniu Cloud ObjectStorage.
For details, see From OBS/OSS.
l MRS HDFSl FusionInsight
HDFSl Apache HDFS
HDFS data can be exportedin CSV, Parquet, or binaryformat and can becompressed in multipleformats.
For details, see From HDFS.
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 61
Migration Source Description Parameter Settings
l MRS HBasel FusionInsight
HBasel Apache HBasel CloudTable
Service
Data can be exported fromMRS, FusionInsight HD,open source Apache HadoopHBase, or CloudTable. Youneed to know all columnfamilies and field names ofHBase tables.
For details, see From HBase/CloudTable.
MRS Hive Data can be exported fromHive through the JDBC API.If the data source is Hive,CDM will automaticallypartition data using the Hivedata partitioning file.
For details, see From Hive.
l FTPl SFTPl Network
AttachedStorage
FTP, SFTP, or NAS data canbe exported in CSV, JSON,or binary format.
For details, see From FTP/SFTP/NAS.
l HTTPl HTTPS
These connectors are used toread files with an HTTP/HTTPS URL, such asreading public files on thethird-party object storagesystem and web disks.Currently, data can only beexported from the HTTP/HTTPS URL to HUAWEICLOUD.
For details, see From HTTP/HTTPS.
l Data WarehouseService
l RDS forMySQL
l RDS for SQLServer
l RDS forPostgreSQL
l DDM
Data can be exported fromthe database services ofHUAWEI CLOUD .
When data is exported from thesedata sources, CDM uses the JDBCAPI to extract data. The jobparameters for the migration sourceare the same. For details, see Froma Relational Database.
l FusionInsightLibrA
l Derecho(GaussDB)
Data can be exported fromFusionInsight LibrA andDerecho.
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 62
Migration Source Description Parameter Settings
l MySQLl PostgreSQLl Oraclel IBM Db2l Microsoft SQL
Server
The databases that are notprovided by HUAWEICLOUD can be thedatabases created in the localdata center or deployed onECSs, or database serviceson the third-party clouds.
l MongoDBl Document
DatabaseService
Data can be exported fromMongoDB or DDS.
For details, see From MongoDB/DDS.
Redis Data can be exported fromopen source Redis.
For details, see From Redis.
Data IngestionService
Currently, data can only beexported from DIS to CloudSearch Service.
For details, see From DIS.
Apache Kafka Currently, data can only beexported from Kafka toCloud Search Service.
For details, see From ApacheKafka.
l Cloud SearchService
l Elasticsearch
Data can be exported fromCloud Search Service orElasticsearch.
For details, see FromElasticsearch/Cloud SearchService.
Step 6 Configure job parameters for the migration destination based on Table 5-2.
Table 5-2 Parameter description
MigrationDestination
Description Parameter Settings
OBS Files (even in a largevolume) can be batchmigrated to OBS in CSV,CarbonData, or binaryformat.
For details, see To OBS.
l MRS HDFSl FusionInsight
HDFSl Apache HDFS
You can select acompression formatwhen importing data toHDFS.
For details, see To HDFS.
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 63
MigrationDestination
Description Parameter Settings
l MRS HBasel FusionInsight
HBasel Apache HBasel CloudTable
Service
Data can be imported toHBase. The compressionalgorithm can be setwhen a new HBase tableis created.
For details, see To HBase/CloudTable.
MRS Hive Data can be rapidlyimported to MRS Hive.
For details, see To Hive.
l FTPl SFTPl Network
AttachedStorage
When FTP/SFTP/NASservers function as themigration destination,CDM usually migratescloud data analysisresults back to local filesystems.
For details, see To FTP/SFTP/NAS.
l Data WarehouseService
l RDS forMySQL
l RDS for SQLServer
l RDS forPostgreSQL
l DDM
Data can be imported todatabase services ofHUAWEI CLOUD.
For details about how to use the JDBCAPI to import data, see To a RelationalDatabase.l When importing data to DWS,
specify the Copy or GDS importmode to improve the importperformance. You can specify theImport Mode parameter whencreating a DWS link.
l When importing data to RDS forMySQL, enable the LOAD DATAfunction of MySQL to acceleratedata import and improve the importperformance. You can configure UseLocal API to enable the functionwhen you create a MySQL link.
FusionInsight LibrA Data can be imported toFusionInsight LibrA butcannot be imported toDerecho (GaussDB).
MySQL MySQL built in the localdata center, created byusers on Elastic CloudServer (ECS), or on thethird-party cloud
Document DatabaseService
Data can be imported tothe DDS but cannot beimported to the localMongoDB.
For details, see To DDS.
Distributed CacheService
Data can be imported toDCS in the String orHashmap value type.Data cannot be importedto the local Redis.
For details, see To DCS.
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 64
MigrationDestination
Description Parameter Settings
l Cloud SearchService
l Elasticsearch
Data can be imported toElasticsearch or CloudSearch Service.
For details, see To Elasticsearch/CloudSearch Service.
Data Lake Insight Data can be imported toDLI.
For details, see To DLI.
Step 7 After the parameters are configured, click Next. The Map Field tab page is displayed. SeeFigure 5-2.
If files are migrated between FTP, SFTP, NAS, HDFS, and OBS and the migration source'sFile Format is set to Binary, files will be directly transferred, free from field mapping.
In other scenarios, CDM automatically maps fields of the source table and the destinationtable. You need to check whether the mapping and time format are correct. For example,check whether the source field type can be converted into the destination field type.
Figure 5-2 Field mapping
NOTE
l If the field mapping is incorrect, you can drag the fields to adjust the mapping.
l On the Map Field tab page, if CDM fails to obtain all columns by obtaining sample values (forexample, when data is exported from HBase, CloudTable, or MongoDB, there is a high probability
that CDM fails to obtain all columns), you can click and select Add a new field to add newfields to ensure that the data imported to the migration destination is complete.
l If the data is imported to DWS, you need to select the distribution columns in the destination fields.You are advised to select the distribution columns according to the following principles:
1. Use the primary key as the distribution column.
2. If multiple data segments are combined as primary keys, specify all primary keys as thedistribution column.
3. In the scenario where no primary key is available, if no distribution column is selected, DWSuses the first column as the distribution column by default. As a result, data skew risks exist.
Step 8 CDM supports field conversion. You can click and then Create Converter. Figure 5-3shows the Create Converter dialog box.
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 65
Figure 5-3 Creating a converter
CDM supports the following converters:
l Anonymization: hides key data in the character string.For example, if you want to convert 12345678910 to 123****8910, configure theparameters as follows:– Set Reserve Start Length to 3.– Set Reserve End Length to 4.– Set Replace Character to *.
l Trim: automatically deletes the spaces before and after the character string.l Reverse string: automatically reverses a character string. For example, reverse ABC
into CBA.l Replace string: replaces the specified character string.l Expression conversion: uses the JSP expression language (EL) to convert the current
field or a row of data. For details, see Field Conversion During Migration.
Step 9 Click Next, set job parameters, and click Show Advanced Attributes to display andconfigure optional parameters. See Figure 5-4.
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 66
Figure 5-4 Task parameters
Table 5-3 describes related parameters.
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 67
Table 5-3 Parameter description
Parameter Description Example Value
Retry upon Failure You can select Retry 3times or Never.You are advised to configureautomatic retry for only filemigration jobs or databasemigration jobs with Importto Staging Table enabled toavoid data inconsistencycaused by repeated datawrites.
Never
Schedule Execution If you select Yes, you canset the start time, cycle, andvalidity period of a job. Fordetails, see Scheduling JobExecution.
No
Concurrent Extractors Number of extractors to beconcurrently executed.Generally, retain the defaultvalue.
1
Concurrent Loaders Number of Loaders to beconcurrently executedThis parameter is displayedonly when HBase or Hiveserves as the destinationdata source.
3
Write Dirty Data Whether to record dirtydata. By default, thisparameter is set to No.
Yes
Write Dirty Data Link This parameter is displayedonly when Write DirtyData is set to Yes.Only links to OBS andHDFS support dirty datawrites.
obs_link
OBS Bucket This parameter is displayedonly when Write DirtyData Link is a link to OBS.Name of the OBS bucket towhich the dirty data is to bewritten.
dirtydata
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 68
Parameter Description Example Value
Dirty Data Directory This parameter is displayedonly when Write DirtyData is set to Yes.Directory storing dirty dataon HDFS or OBS. Dirtydata will be saved onlywhen this parameter isconfigured.You can go to this directoryto query data that fails to beprocessed or data that isfiltered out during jobexecution, and check whichsource data does not meettransformation or cleaningrules.
/user/dirtydir
Max. Error Records in aSingle Shard
This parameter is displayedonly when Write DirtyData is set to Yes.When the number of errorrecords of a single mapexceeds the upper limit, thejob will automaticallyterminate and the importeddata cannot be rolled back.You are advised to use atemporary table as thedestination table. After thedata is imported, rename thetable or combine it into thefinal data table.
0
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 69
Parameter Description Example Value
Delete Job AfterCompletion
Perform the followingoperations after a job isexecuted:l Do not delete: The job is
not deleted after it isexecuted.
l Delete after success:The job is deleted onlywhen the job issuccessfully executed. Itis applicable to massiveone-time jobs.
l Delete: The job isdeleted regardless ofwhether it is successfullyexecuted or fails to beexecuted.
Do not delete
Step 10 Click Save or Save and Run.
If you click Save, manually start the job by clicking Run on the Job Management page.
----End
5.1.2 Entire DB Migration
ScenarioCDM supports entire database migration between homogeneous and heterogeneous datasources. The migration principles are the same as those in Table/File Migration. Each type ofElasticsearch can be executed concurrently as a subtask.
Entire database migration is applicable to the scenario where an on-premises data center or adatabase created on the HUAWEI CLOUD ECS is synchronized to HUAWEI CLOUDdatabase services or big data services. It is suitable for offline database migration but notonline real-time migration. Figure 5-5 lists the data sources that support entire databasemigration using CDM.
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 70
Figure 5-5 Supported data sources in entire DB migration
The databases on the migration source can be on-premises data centers, databases built onHUAWEI CLOUD ECSs, or third-party database services.
Prerequisitesl You have created a link by referring to Creating a Link.l The CDM cluster can communicate with the data source.
Procedure
Step 1 Log in to the CDM management console.
Step 2 In the left navigation pane, click Cluster Management. Locate the target cluster and clickJob Management.
Step 3 Choose Entire DB Migration > Create Job. The page for configuring the job is displayed.See Figure 5-6.
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 71
Figure 5-6 Creating an entire database migration job
Step 4 Configure the related parameters of the source database according to Table 5-4.
Table 5-4 Parameter description
Source Database Parameter Description ExampleValue
l Oraclel MySQLl PostgreSQLl Microsoft SQL
Server
Schema/Tablespace
Name of the database from whichdata is to be extracted. Click theicon next to the text box to go tothe page for configuring theparameter or directly enter aschema or tablespace.If the desired schema ortablespace is not displayed, checkwhether the login account has thepermission to query metadata.
schema
Elasticsearch Index Index of the data to be extracted,which is similar to the schema orname of a relational database
index
Step 5 Configure the related parameters of the destination cloud service according to Table 5-5.
Table 5-5 Parameter description
Cloud Service Parameter Description
l Data WarehouseService
Schema/Tablespace
Database name
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 72
Cloud Service Parameter Description
l MRS Hivel RDS for
MySQL
Auto TableCreation
The options are as follows:l Non-auto creation: CDM will not
automatically create a table.l Auto creation: If no corresponding table
exists in the destination database, CDMwill automatically create one.
l Deletion before creation: If a table withthe same name exists in the destinationdatabase, CDM will delete the table firstand create another one with the samename.
Clear Data BeforeImport
This parameter is not displayed if Auto TableCreation is set to Deletion before creation.l Yes: CDM will delete data in the tables
who share the same names with the tablesin the source database.
l No: Table data will not be cleared beforedata import. If you set this parameter toNo and tables are not empty, the importeddata will be appended to the existingtables.
Cloud SearchService
Index Index to which data is to be written, which issimilar to the schema or name of a relationaldatabase
Clear Data BeforeImport
Whether to clear data of the target type beforedata is written
HUAWEI CLOUDOBS
- For details about the destination jobparameters required for entire databasemigration to OBS, see To OBS.
Step 6 If the database to be migrated is a relational database, click Next to specify whether tomigrate some or all tables after configuring job parameters.
After selecting desired tables, click or to move them to the right pane.
Step 7 Click Save or Save and Run.
When the job starts running, a sub-job will be generated for each table. You can click the jobname to view the sub-job list.
----End
5.2 Source Job Parameters
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 73
5.2.1 From OBS/OSSWhen the source link of a job is the Link to OBS or Link to OSS on Alibaba Cloud, that is,when data is exported from HUAWEI CLOUD OBS or Alibaba Cloud OSS, configure thesource job parameters based on Table 5-6.
Advanced attributes are optional and not displayed by default. You can click Show AdvancedAttributes to display them.
Table 5-6 Parameter description
Category Parameter Description ExampleValue
Basicparameters
Bucket Name Name of the bucket from which datais to be migrated
BUCKET_2
Source Directory/File
Path of the directory or file fromwhich data is to be extracted. The filepath can contain a maximum of 50files, which are separated by verticalbars (|). For details, see Migration ofa List of Files.This parameter can be configured as amacro variable of date and time and apath name can contain multiplemacro variables. When the macrovariable of date and time works witha scheduled job, the incremental datacan be synchronized periodically. Fordetails, see IncrementalSynchronization Using the MacroVariables of Date and Time.
FROM/example.csv
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 74
Category Parameter Description ExampleValue
File Format Format in which CDM parses data.The options are as follows:l CSV: Source files will be
migrated to tables after beingparsed in CSV format.
l Binary: Files (even not in binaryformat) will be directly transferredwithout resolution. You areadvised to select Binary whenmigrating files to files because themigration efficiency is higher.
l JSON: Source files will bemigrated to tables after beingparsed in JSON format.
l CarbonData: Source files will bemigrated to tables after beingparsed in CarbonData format. Thisparameter is displayed only whenthe migration source is OBS.
CSV
JSON Type This parameter is displayed onlywhen File Format is set to JSON.Type of a JSON object stored in aJSON file. The options are JSONobject and JSON array.
JSON object
JSON ReferenceNode
This parameter is displayed onlywhen File Format is set to JSON.CDM parses the data under the JSONnode. If the node's corresponding datais an array of JSON, the system willextract data from the array in thesame pattern. Use periods (.) toseparate multi-layer nested JSONnodes.
data.list
Advancedattributes
Line Separator Lind feed characters in a file.Supported values include \n, \r, and\r\n. This parameter is displayed onlywhen File Format is set to CSV.
\n
Field Delimiter Character used to separate fields inthe file. To set the Tab key as thedelimiter, set this parameter to \t.This parameter is displayed onlywhen File Format is set to CSV.
,
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 75
Category Parameter Description ExampleValue
Use QuoteCharacter
If you set this parameter to Yes, thefield delimiters in the encirclingsymbol are regarded as a part of thestring value. Currently, the defaultencircling symbol of CDM is ".
No
Use RE toSeparate Fields
Whether to use regular expressions toseparate fields. If you set thisparameter to Yes, Field Delimiterbecomes invalid. This parameter isdisplayed only when File Format isset to CSV.
Yes
RegularExpression
Regular expression used to separatefields. For details about regularexpressions, see Using RegularExpressions to Separate Semi-structured Text.
^(\d.*\d)(\w*) \[(.*)\]([\w\.]*)(\w.*).*
Use First Row asHeader
This parameter is displayed onlywhen File Format is set to CSV. Ifyou set this parameter to Yes, CDMwill use the first row as the headerwhen extracting data.
Yes
Encoding Type Encoding type, for example, UTF-8or GBK. You can set the encodingtype for text files only. Thisparameter is invalid if File Format isset to Binary.
GBK
CompressionFormat
This parameter is displayed onlywhen File Format is set to CSV orJSON. The options are as follows:l None: indicates that files in all
formats can be transferred.l gzip: indicates that only files in
gzip format can be transferred.l Zip: indicates that only files in
Zip format can be transferred.
None
Source FileProcessingMethod
Operation on source files after the jobsucceeds.l Rename: After the job succeeds,
rename the source files by addingusernames and timestamps assuffixes to file names.
l Delete: After the job succeeds,delete the source files.
Rename
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 76
Category Parameter Description ExampleValue
Start Job byMarker File
Whether to start a job by a markerfile. A job is started only when amarker file for starting the job existsin the source path. Otherwise, the jobwill be suspended for a period of timespecified by Suspension Period.
No
Marker File Name of the marker file for starting ajob. If you specify a marker file, themigration job is executed only whenthe marker file exists in the sourcepath. The marker file will not bemigrated.
ok.txt
Suspension Period Period of waiting for a marker file. Ifyou set Start Job by Marker File toYes but no marker file exists in thesource path, the job fails uponsuspension timeout.If you set this parameter to 0 and nomarker file exists in the source path,the job will fail immediately.Unit: second
10
Filter Type Whether to filter the files by wildcardor regular expression. If you chooseto filter files by regular expression,the Java regular expressions are used.For details, see File/Path Filter.l Wildcard: indicates that wildcard
characters are used.l Regex: indicates that Java regular
expressions are used.
Wildcard
Path Filter Filter directories under the input path.Only directories meeting the filterconditions can be migrated. Multiplepaths can be configured. Use commas(,) to separate multiple paths.
*input
File Filter Filter files under the input path. Onlyfiles meeting the filtering rules can bemigrated. Multiple files can beconfigured. Use commas (,) toseparate multiple files.
*.csv,*.txt
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 77
NOTE
1. CDM supports incremental file migration (by skipping repeated files), but does not supportresumable transfer.
For example, if three files are to be migrated and the second file fails to be migrated due to thenetwork fault. When the migration task is started again, the first file is skipped. The second file,however, cannot be migrated from the point where the fault occurs, but can only be migrated again.
2. During file migration, a single task supports a maximum of 100,000 files. If there are too many filesin the directory to be migrated, you are advised to split the files into different directories and createmultiple tasks.
5.2.2 From HDFSWhen the source link of a job is the Link to HDFS, that is, when data is exported from MRSHDFS, FusionInsight HDFS, or Apache HDFS, configure the source job parameters based onTable 5-7.
Table 5-7 Parameter description
Category Parameter Description ExampleValue
Basicparameters
Source Directory/File
Path of the directory or file fromwhich data is to be extractedThis parameter can be configuredas a macro variable of date and timeand a path name can containmultiple macro variables. When themacro variable of date and timeworks with a scheduled job, theincremental data can besynchronized periodically. Fordetails, see IncrementalSynchronization Using the MacroVariables of Date and Time.
/user/cdm/
File Format File format used when transferringdata. The options are as follows:l CSV: Source files will be
migrated to tables after beingparsed in CSV format.
l Binary: Files (even not inbinary format) will be directlytransferred without resolution.You are advised to select Binarywhen migrating files to filesbecause the migration efficiencyis higher.
l Parquet: Source files will bemigrated to tables after beingparsed in Parquet format.
CSV
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 78
Category Parameter Description ExampleValue
Advancedattributes
Line Separator Lind feed characters in a file.Supported values include \n, \r, and\r\n. This parameter is displayedonly when File Format is set toCSV.
\n
Field Delimiter Character used to separate fields inthe file. To set the Tab key as thedelimiter, set this parameter to \t.This parameter is displayed onlywhen File Format is set to CSV.
,
Use First Row asHeader
This parameter is displayed onlywhen File Format is set to CSV. Ifyou set this parameter to Yes, CDMwill use the first row as the headerwhen extracting data.
No
File Split Method Whether to split files by file or sizeIf HDFS files are split, each shardis regarded as a file.l File: Separate files by file
quantity. If there are 10 files andConcurrent Extractors is set to5, each shard consists of twofiles.
l Size: Separate files by file size.Files will not be split forbalance. Suppose there are 10files, among which nine are 10MB and one is 200 MB in size.If Concurrent Extractors is setto 2, two shards will be created,one for processing the nine 10MB files, the other one forprocessing the 200 MB file.
File
Source FileProcessing Method
Operation on source files after thejob succeeds.l Rename: After the job
succeeds, rename the sourcefiles by adding usernames andtimestamps as suffixes to filenames.
l Delete: After the job succeeds,delete the source files.
Rename
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 79
Category Parameter Description ExampleValue
Startup Job MarkerFile
Whether to start a job by a markerfile. A job is started only when amarker file for starting the jobexists in the source path. Otherwise,the job will be suspended for aperiod of time specified bySuspension Period.
ok.txt
Filter Type Whether to filter the files bywildcard or regular expression. Ifyou choose to filter files by regularexpression, the Java regularexpressions are used. For details,see File/Path Filter.l Wildcard: indicates that
wildcard characters are used.l Regex: indicates that Java
regular expressions are used.
Wildcard
Path Filter Filter directories under the inputpath. Only directories meeting thefilter conditions can be migrated.Multiple paths can be configured.Use commas (,) to separate multiplepaths.
*input
File Filter Filter files under the input path.Only files meeting the filteringrules can be migrated. Multiple filescan be configured. Use commas (,)to separate multiple files.
*.csv
NOTE
HDFS supports the UTF-8 encoding only. Retain the default value UTF-8.
5.2.3 From HBase/CloudTableWhen the source link of a job is the Link to HBase or Link to CloudTable, that is, whendata is exported from MRS HBase, FusionInsight HBase, or Apache HBase, configure thesource job parameters based on Table 5-8.
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 80
NOTE
1. When you migrate data from CloudTable or HBase, CDM reads the first row of the table as anexample of the field list. If the first row of data does not contain all fields of the table, you need tomanually add fields.
2. Because HBase is schema-less, CDM cannot obtain the data types. If the data is stored in binaryformat, CDM cannot parse the data.
3. When data is exported from HBase or CloudTable, because HBase and CloudTable are schema-lessstorage systems, CDM requires that the source numeric fields be stored in character strings ratherthan in binary format. For example, the value 100 needs to be stored as 100 rather than 01100100.
Table 5-8 Parameter description
Parameter Description Example Value
Table Name Name of the HBase table from which data is to beexportedThis parameter can be configured as a macro variableof date and time and a path name can contain multiplemacro variables. When the macro variable of date andtime works with a scheduled job, the incremental datacan be synchronized periodically. For details, seeIncremental Synchronization Using the MacroVariables of Date and Time.
TBL_2
ColumnFamilies
(Optional) Column families to which the exported databelongs
CF1&CF2
SplitRowkey
(Optional) Whether to split a rowkey. The defaultvalue is No.
Yes
RowkeyDelimiter
(Optional) Delimiter used to split a rowkey. If thisparameter is left empty, the rowkey will not be split.
|
Start Time (Optional) Start time (including the value) forextracting data. The format is yyyy-MM-dd HH:mm:ss.Only the data generated at the specified time and lateris extracted.This parameter can be set to a macro variable of dateand time. When the macro variable of date and timeworks with a scheduled job, the incremental data canbe synchronized periodically. For details, seeIncremental Synchronization Using the MacroVariables of Date and Time.
2017-12-3120:00:00
End Time (Optional) End time (excluding the value) forextracting data. The format is yyyy-MM-dd HH:mm:ss.Only the data generated before the time point isextracted.This parameter can be set to a macro variable of dateand time. For details, see IncrementalSynchronization Using the Macro Variables of Dateand Time.
2018-01-0120:00:00
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 81
5.2.4 From HiveIf the source link of a job is the Link to Hive, configure the source job parameters based onTable 5-9.
Table 5-9 Parameter description
Parameter Description Example Value
Database Name Database name. Click the icon next to thetext box. The dialog box for selecting thedatabase is displayed.
default
Table Name Hive table name. Click the icon next to thetext box. The dialog box for selecting thetable is displayed.This parameter can be configured as a macrovariable of date and time and a path namecan contain multiple macro variables. Whenthe macro variable of date and time workswith a scheduled job, the incremental datacan be synchronized periodically. For details,see Incremental Synchronization Using theMacro Variables of Date and Time.
TBL_EXAMPLE
NOTE
If the data source is Hive, CDM will automatically partition data using the Hive data partitioning file.
5.2.5 From FTP/SFTP/NASIf the source link of a job is the Link to an FTP or SFTP Server or Link to a NAS Server,configure the source job parameters based on Table 5-10.
Advanced attributes are optional and not displayed by default. You can click Show AdvancedAttributes to display them.
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 82
Table 5-10 Parameter description
Category Parameter Description ExampleValue
Basicparameters
Source Directory/File
Path of the directory or file fromwhich data is to be extracted. Thefile path can contain a maximum of50 files, which are separated byvertical bars (|). For details, seeMigration of a List of Files.This parameter can be configuredas a macro variable of date and timeand a path name can containmultiple macro variables. When themacro variable of date and timeworks with a scheduled job, theincremental data can besynchronized periodically. Fordetails, see IncrementalSynchronization Using the MacroVariables of Date and Time.
/ftp/a.csv|/ftp/b.txt
File Format Format in which CDM parses data.The options are as follows:l CSV: Source files will be
migrated to tables after beingparsed in CSV format.
l Binary: Files (even not inbinary format) will be directlytransferred without resolution.You are advised to select Binarywhen migrating files to filesbecause the migration efficiencyis higher.
l JSON: Source files will bemigrated to tables after beingparsed in JSON format.
CSV
JSON Type This parameter is displayed onlywhen File Format is set to JSON.Type of a JSON object stored in aJSON file. The options are JSONobject and JSON array.
JSON object
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 83
Category Parameter Description ExampleValue
JSON ReferenceNode
This parameter is displayed onlywhen File Format is set to JSON.CDM parses the data under theJSON node. If the node'scorresponding data is an array ofJSON, the system will extract datafrom the array in the same pattern.Use periods (.) to separate multi-layer nested JSON nodes.
data.list
Advancedattributes
Line Separator Lind feed characters in a file.Supported values include \n, \r, and\r\n. This parameter is displayedonly when File Format is set toCSV.
\n
Field Delimiter Character used to separate fields inthe file. To set the Tab key as thedelimiter, set this parameter to \t.This parameter is displayed onlywhen File Format is set to CSV.
,
Use QuoteCharacter
If you set this parameter to Yes, thefield delimiters in the encirclingsymbol are regarded as a part of thestring value. Currently, the defaultencircling symbol of CDM is ".
No
Use RE to SeparateFields
Whether to use regular expressionsto separate fields. If you set thisparameter to Yes, Field Delimiterbecomes invalid. This parameter isdisplayed only when File Formatis set to CSV.
Yes
Regular Expression Regular expression used to separatefields. For details about regularexpressions, see Using RegularExpressions to Separate Semi-structured Text.
^(\d.*\d)(\w*) \[(.*)\]([\w\.]*)(\w.*).*
Use First Row asHeader
This parameter is displayed onlywhen File Format is set to CSV. Ifyou set this parameter to Yes, CDMwill use the first row as the headerwhen extracting data.
Yes
Encoding Type Encoding type, for example, UTF-8or GBK. You can set the encodingtype for text files only. Thisparameter is invalid if File Formatis set to Binary.
UTF-8
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 84
Category Parameter Description ExampleValue
CompressionFormat
This parameter is displayed onlywhen File Format is set to CSV orJSON. The options are as follows:l None: indicates that files in all
formats can be transferred.l gzip: indicates that only files in
gzip format can be transferred.l Zip: indicates that only files in
Zip format can be transferred.
None
Source FileProcessing Method
Operation on source files after thejob succeeds.l Rename: After the job
succeeds, rename the sourcefiles by adding usernames andtimestamps as suffixes to filenames.
l Delete: After the job succeeds,delete the source files.
Rename
Start Job by MarkerFile
Whether to start a job by a markerfile. A job is started only when amarker file for starting the jobexists in the source path. Otherwise,the job will be suspended for aperiod of time specified bySuspension Period.
Yes
Marker File Name of the marker file for startinga job. If you specify a marker file,the migration job is executed onlywhen the marker file exists in thesource path. The marker file willnot be migrated.
ok.txt
Suspension Period Period of waiting for a marker file.If you set Start Job by MarkerFile to Yes but no marker file existsin the source path, the job failsupon suspension timeout.If you set this parameter to 0 and nomarker file exists in the sourcepath, the job will fail immediately.Unit: second
10
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 85
Category Parameter Description ExampleValue
Filter Type Whether to filter the files bywildcard or regular expression. Ifyou choose to filter files by regularexpression, the Java regularexpressions are used. For details,see File/Path Filter.l Wildcard: indicates that
wildcard characters are used.l Regex: indicates that Java
regular expressions are used.
Wildcard
Path Filter Filter directories under the inputpath. Only directories meeting thefilter conditions can be migrated.Multiple paths can be configured.Use commas (,) to separate multiplepaths.
*input,*out
File Filter Filter files under the input path.Only files meeting the filteringrules can be migrated. Multiple filescan be configured. Use commas (,)to separate multiple files.
*.csv
5.2.6 From HTTP/HTTPSWhen the source link of a job is the HTTP link, configure the source job parameters based onTable 5-11. Currently, data can only be exported from the HTTP/HTTPS URL to HUAWEICLOUD.
Table 5-11 Parameter description
Parameter Description Example Value
File URL Use the GET method to obtain data from theHTTP/HTTPS URL.These connectors are used to read files with anHTTP/HTTPS URL, such as reading public fileson the third-party object storage system and webdisks.
https://bucket.obs.myhwclouds.com/object-key
File Format Currently, CDM supports Binary only, whichindicates that files (even not in binary format)will be directly transferred without resolution.
Binary
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 86
5.2.7 From a Relational DatabaseWhen the source link of a job is the Link to Relational Databases, that is, when data isexported from the following databases, configure the source job parameters based on Table5-12.
l Data Warehouse Servicel RDS for MySQLl RDS for SQL Serverl RDS for PostgreSQLl DDMl FusionInsight LibrAl Derecho (GaussDB)l MySQLl PostgreSQLl Oraclel IBM Db2l Microsoft SQL Server
Table 5-12 Parameter description
Category Parameter Description ExampleValue
Basicparameters
Schema/Tablespace Name of the database from whichdata is to be extracted. Click theicon next to the text box to go to thepage for configuring the parameteror directly enter a schema ortablespace.If the desired schema or tablespaceis not displayed, check whether thelogin account has the permission toquery metadata.
SCHEMA_EXAMPLE
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 87
Category Parameter Description ExampleValue
Table Name Table from which data is to beextracted. Click the icon next to thetext box to go to the page forselecting the table or directly entera table name.If the desired table is not displayed,check whether the table exists orwhether the login account has thepermission to query metadata.This parameter can be configuredas a macro variable of date and timeand a path name can containmultiple macro variables. When themacro variable of date and timeworks with a scheduled job, theincremental data can besynchronized periodically. Fordetails, see IncrementalSynchronization Using the MacroVariables of Date and Time.
all_type
Advancedattributes
Partition Column Field used to partition data duringdata extraction. CDM splits jobsinto multiple tasks based on thisfield and executes the tasksconcurrently. Fields with datadistributed evenly are used, such asthe sequential number field.Click the icon next to the text boxto go to the page for selectingcolumns or directly enter a partitioncolumn name.
id
Where Clause SQL statement used to specify thedata extraction range. If thisparameter is not set, the entire tablewill be extracted.This parameter can be configuredas a macro variable of date and timeto extract data generated at aspecific date. For details, seeIncremental SynchronizationUsing the Macro Variables ofDate and Time.
time between'${timestamp(-1,DAY)} and${timestamp()}'
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 88
Category Parameter Description ExampleValue
Regain Symbol After Regain Symbol is set to aspecified field, CDM queries thetable imported to the destinationdatabase every time a scheduledtask is started. If the table does notcontain the specified field, CDMperforms full migration. If the tablecontains the specified field and thefield has a value, CDM performsincremental migration to migrateonly the data whose value is greaterthan the value of this field.For details about how to use thisparameter, see IncrementalMigration Using the RegainSymbol.
date
MigrateIncremental Data
Whether to migrate incrementaldata in MySQL Binlog mode.Currently, this mode can be used inthe table/file migration from theMySQL database to the DWSdatabase only.After this function is enabled, datain the source table and destinationtable can be synchronized in realtime. One MySQL link supportsonly one incremental migration job,and one source table supports onlyone incremental migration job.
No
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 89
NOTE
l If the migration source is Oracle, CDM will automatically partition data using the ROWID.
l In the migration from MySQL to DWS, the constraints on the incremental data migration function inMySQL Binlog mode are as follows:1. A single cluster supports only one incremental migration job in MySQL Binlog mode in the
current version.2. In the current version, you are not allowed to delete or update 10,000 data records at a time.3. Entire database migration is not supported.4. DDL Data Definition Language (DDL) operations are not supported.5. Event migration is not supported.6. If you set Migrate Incremental Data to Yes, binlog_format in the source MySQL database
must be set to ROW.7. If you set Migrate Incremental Data to Yes and binlog file ID disorder occurs on the source
MySQL instance due to cross-machine migration or rebuilding during incremental datamigration, incremental data may be lost.
8. If a primary key exists in the destination table and incremental data is generated during therestart of the CDM cluster or full migration, duplicate data may exist in the primary key. As aresult, the migration fails.
9. If the destination DWS database is restarted, the migration will fail. In this case, restart the CDMcluster and the migration job.
The recommended MySQL configuration is as follows:# Enable the bin-log function.log-bin=mysql-bin# ROW modebinlog-format=ROW# gtid mode. The recommended version is 5.6.10 or later.gtid-mode=ONenforce_gtid_consistency = ON
5.2.8 From MongoDB/DDSWhen you migrate data from MongoDB to a relational database, CDM reads the first row ofthe collection as an example of the field list. If the first row of data does not contain all fieldsof the collection, you need to manually add fields.
When the source link of a job is the Link to MongoDB/DDS, that is, when data is exportedfrom an on-premises MongoDB or DDS, configure the source job parameters based on Table5-13.
Table 5-13 Parameter description
Parameter Description Example Value
Database Name Name of the database from which data is to bemigrated
mongodb
Collection Name Collection name, similar to the table name of arelational database. Click the icon next to thetext box to go to the page for selecting thecollection or directly enter a collection name.If the desired table is not displayed, checkwhether the table exists or whether the loginaccount has the permission to query metadata.
COLLECTION_NAME
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 90
5.2.9 From RedisBecause DCS restricts the commands for obtaining keys, it cannot serve as the migrationsource but can be the migration destination. The Redis service of the third-party cloud cannotserve as the migration source. However, the Redis set up in the on-premises data center or onthe ECS can be the migration source and destination.
When data is exported from the on-premises Redis, configure the source job parameters basedon Table 5-14.
Table 5-14 Parameter description
Parameter Description ExampleValue
Redis KeyPrefix
Key prefix, which is similar to the table name of arelational database
TABLENAME
Value StorageType
The options are as follows:l String: without column name, such as
value1,value2l Hash: with column name, such as
column1=value1,column2=value2
String
Key Delimiter Character used to separate table names and columnnames of a relational database
_
Value Delimiter Character used to separate columns when the storagetype is string
;
5.2.10 From DISThe data in the message body is a record in CSV format that supports multiple delimiters.Messages cannot be parsed in binary or other formats.
If the source link of a job is the Link to DIS, configure the source job parameters based onTable 5-15. Currently, data can only be exported from DIS to Cloud Search Service.
Table 5-15 Parameter description
Parameter Description ExampleValue
DIS Stream DIS stream name dis
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 91
Parameter Description ExampleValue
Offset Initial offset when data is pulled from DISl Latest: Maximum offset, indicating that the
latest data will be extracted.l From last stop: Data read will start from
which the last read ends.l Earliest: Minimum offset, indicating that the
earliest data will be extracted.
Latest
Permanent Running Whether the job is permanently running. If a job isset to run for a long time, the job will fail if theDIS system is interrupted.
Yes
DIS Partition ID ID of the DIS partition. You can enter multiplepartition IDs, which are separated by commas (,).
0,1,2
Field Delimiter The default value is space. To set the Tab key asthe delimiter, set this parameter to \t.
,
Max. Poll Records (Optional) Maximum number of records per poll 100
5.2.11 From Apache KafkaIf the source link of a job is a link to Kafka, configure parameters according to Table 5-16.
Table 5-16 Parameter description
Parameter Description ExampleValue
Topics One or more topics can be entered. test1,test2
Offset Initial offset parameterl Latest: Maximum offset, indicating that the
latest data will be extracted.l Earliest: Minimum offset, indicating that the
earliest data will be extracted.
Latest
Permanent Running Whether the job is permanently running Yes
Group ID Group ID -
Field Delimiter The default value is space. To set the Tab key asthe delimiter, set this parameter to \t.
,
Max. Poll Records (Optional) Maximum number of records per poll 100
Max. Poll Interval (Optional) Maximum interval between polls(seconds)
100
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 92
5.2.12 From Elasticsearch/Cloud Search ServiceIf the source link of a job is the Link to Elasticsearch, configure the source job parametersbased on Table 5-17.
Table 5-17 Parameter description
Parameter Description Example Value
Index Elasticsearch index, which is similar to thename of a relational database. The indexname can contain only lowercase letters.
index
Type Elasticsearch type, which is similar to thetable name of a relational database. Thetype name can contain only lowercaseletters.
type
Split Nested Field (Optional) Whether to split the JSONcontent of the nested fields. For example, a:{ b:{ c:1, d:{ e:2, f:3 } } } can be split intoa.b.c, a.b.d.e, and a.b.d.f.
No
Filter Conditions (Optional) Whether to use the query stringto filter the source data. CDM migratesonly the data that meets the filterconditions.
last_name:Smith
5.3 Destination Job ParametersThis section describes how to configure destination job parameters when creating a table/filemigration job.
5.3.1 To OBSWhen the destination link of a job is the Link to OBS, that is, when data is imported to OBS,configure the destination job parameters based on Table 5-18.
Advanced attributes are optional and not displayed by default. You can click Show AdvancedAttributes to display them.
Table 5-18 Parameter description
Category Parameter Description ExampleValue
Basicparameters
Bucket Name Name of the OBS bucket to whichdata is to be written
BUCKET_2
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 93
Category Parameter Description ExampleValue
Write Directory OBS directory to which data is to bewritten. Do not add / in front of thedirectory name.This parameter can be configured asa macro variable of date and time anda path name can contain multiplemacro variables. When the macrovariable of date and time works witha scheduled job, the incremental datacan be synchronized periodically. Fordetails, see IncrementalSynchronization Using the MacroVariables of Date and Time.
DIRECTORY/
File Format Format in which data is written. Theoptions are as follows:l CSV: Data is written in CSV
format, which is applicable tomigrating data tables to files.
l Binary: Files will be directlytransferred without resolution.CDM writes the files withoutchanging the original file format,which is applicable to themigration of files to files.
l CarbonData: Data is written inCarbonData format, which isapplicable to migrating data tablesto files.
If data is migrated between file-related data sources, such as FTP,SFTP, NAS, HDFS, and OBS, thevalue of File Format must the sameas the source file format.
CSV
Duplicate FileProcessingMethod
Files with the same name and size areidentified as duplicate files. If thereare duplicate files during datawriting, the following methods areavailable:l Replacel Skipl Stop jobFor details, see Duplicate FileProcessing Method.
Skip
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 94
Category Parameter Description ExampleValue
Advancedattributes
KMSEncryption
Whether to encrypt the uploaded databy using Key Management Service(KMS). If KMS encryption isenabled, MD5 verification cannot beperformed for data. For details, seeData Encryption During theMigration to OBS.
Yes
Key ID Key used for encryption duringupload. You need to create a key inKMS in advance.
53440ccb-3e73-4700-98b5-71ff5476e621
Copy Content-Type
This parameter is displayed onlywhen File Format is Binary andboth the migration source anddestination are object storage.If you set this parameter to Yes, theContent-Type attribute of the sourcefile is copied during object filemigration. This function is mainlyused for static website migration.The Content-Type attribute cannot bewritten to Archive buckets.Therefore, if you set this parameter toYes, the migration destination mustbe a non-Archive bucket.
No
Line Separator Lind feed character in the file. Bydefault, the system automaticallyidentifies \n, \r, and \r\n. Thisparameter is invalid when FileFormat is set to Binary.
\n
Field Delimiter Field delimiter in the file. Thisparameter is invalid when FileFormat is set to Binary.
,
File Size This parameter is displayed onlywhen the migration source is adatabase. Files are partitioned asmultiple files by size so that they canbe exported in proper size. The unitis MB.
1024
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 95
Category Parameter Description ExampleValue
Validate MD5Value
The MD5 value can be verified onlywhen files are transferred in Binaryformat. KMS encryption cannot beused when the MD5 value needs tobe verified.Calculate the MD5 value of thesource files and verify it with theMD5 value returned by OBS. If anMD5 file exists on the migrationsource, the system directly reads theMD5 file from the migration sourceand verify it with the MD5 valuereturned by OBS. For details, seeMD5 Verification for Files inMigration.
Yes
Record MD5VerificationResult
Whether to record the MD5verification result when ValidateMD5 Value is set to Yes
Yes
Record MD5Link
OBS link to which the MD5verification result is to be written
obslink
Record MD5Bucket
OBS bucket to which the MD5verification result is to be written
cdm05
Record MD5Directory
Directory to which the MD5verification result is to be written
/md5/
Encoding Type Encoding type, for example, UTF_8or GBK. This parameter is invalidwhen File Format is set to Binary.
GBK
Use QuoteCharacter
This parameter is displayed onlywhen File Format is CSV. It is usedwhen database tables are migrated tofile systems.If you set this parameter to Yes and afield in the source data table containsa field delimiter or line separator,CDM uses double quotation marks(") as the quote character to quote thefield content as a whole to prevent afield delimiter from dividing a fieldinto two fields, or a line separatorfrom dividing a field into differentlines. For example, if the hello,worldfield in the database is quoted, it willbe exported to the CSV file as awhole.
No
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 96
Category Parameter Description ExampleValue
Job SuccessMarker File
Whether to generate a marker filewith a custom name in thedestination directory after a job isexecuted successfully. If you do notspecify a file name, this function isdisabled by default.
finish.txt
5.3.2 To HDFSWhen the destination link of a job is the Link to HDFS, that is, when data is imported to thefollowing data sources, configure the destination job parameters based on Table 5-19.
l MRS HDFSl FusionInsight HDFSl Apache HDFS
Table 5-19 Parameter description
Parameter Description ExampleValue
Write Directory HDFS directory to which data is to be written.This parameter can be configured as a macrovariable of date and time and a path name cancontain multiple macro variables. When themacro variable of date and time works with ascheduled job, the incremental data can besynchronized periodically. For details, seeIncremental Synchronization Using theMacro Variables of Date and Time.
/user/output
File Format Format in which data is written. The options areas follows:l CSV: Data is written in CSV format, which
is applicable to migrating data tables to files.l Binary: Files will be directly transferred
without resolution. CDM writes the fileswithout changing the original file format,which is applicable to the migration of filesto files.
If data is migrated between file-related datasources, such as FTP, SFTP, NAS, HDFS, andOBS, the value of File Format must the same asthe source file format.
CSV
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 97
Parameter Description ExampleValue
Duplicate FileProcessing Method
Files with the same name and size are identifiedas duplicate files. If there are duplicate filesduring data writing, the following methods areavailable:l Replacel Skipl Stop job
Stop job
Compression Format File compression format after data writing. Thefollowing compression formats are supported:l None: Do not compress the files.l DEFLATE: Compress the files in DEFLATE
format.l gzip: Compress the files in gzip format.l bzip2: Compress the files in bzip2 format.l LZ4: Compress the files in LZ4 format.l Snappy: Compress the files in Snappy
format.
Snappy
Line Separator Lind feed character in the file. By default, thesystem automatically identifies \n, \r, and \r\n.This parameter is invalid when File Format isset to Binary.
\n
Field Delimiter Field delimiter in the file. This parameter isinvalid when File Format is set to Binary.
,
NOTE
HDFS supports the UTF-8 encoding only. Retain the default value UTF-8.
5.3.3 To HBase/CloudTableWhen the destination link of a job is the Link to HBase or Link to CloudTable, that is, whendata is imported to the following data sources, configure the destination job parameters basedon Table 5-20.
l MRS HBasel FusionInsight HBasel Apache HBasel CloudTable Service
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 98
Table 5-20 Parameter description
Parameter Description ExampleValue
Table Name Name of the HBase table to which data is to bewritten. If you want to create an HBase table,you can copy the field names from the migrationsource. Click the icon next to the text box. Thedialog box for selecting the table is displayed.This parameter can be configured as a macrovariable of date and time and a path name cancontain multiple macro variables. When themacro variable of date and time works with ascheduled job, the incremental data can besynchronized periodically. For details, seeIncremental Synchronization Using theMacro Variables of Date and Time.
TBL_2
Clear Data BeforeImport
Operation on the tables with duplicate namesbefore data import. The options are as follows:l Yes: CDM will delete data in the tables who
share the same names with the tables in thesource database.
l No: Data is appended to the existing tables.
Yes
Rowkey Delimiter (Optional) Used to combine multiple rows into arowkey. Spaces are used by default.
,
Rowkey DataRedundancy
(Optional) Whether to write the rowkey data intoHBase columns. The default value is No.
No
Compression Format (Optional) Compression format used in creatinga new HBase table. The default value is None.l None: Do not compress the files.l Snappy: Compress the files in Snappy
format.l gzip: Compress the files in gzip format.
None
Write WAL Whether to enable Write Ahead Log (WAL) ofHBase. The options are as follows:l Yes: If the HBase server breaks down after
the function is enabled, you can replay theoperations that have not been performed inWAL.
l No: If you set this parameter to No, the writeperformance is improved. However, if theHBase server breaks down, data may be lost.
No
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 99
Parameter Description ExampleValue
Match Data Type l Yes: Data of the Short, Int, Long, Float,Double, and Decimal columns in the sourcedatabase is converted into Byte[] arrays(binary) and written into HBase. Other typesof data are written as character strings. Ifseveral types of data mentioned above arecombined as rowkeys, they will be written ascharacter strings.This function saves storage space. In specificscenarios, the rowkey distribution is evener.
l No: All types of data in the source databaseare written into HBase as character strings.
No
5.3.4 To HiveWhen the destination link of a job is the Link to Hive, configure the destination jobparameters based on Table 5-21.
Table 5-21 Parameter description
Parameter Description Example Value
Database Name Database name. Click the icon next to thetext box. The dialog box for selecting thedatabase is displayed.
default
Auto Table Creation This parameter is displayed only whenboth the migration source and destinationare relational databases. The options areas follows:l Non-auto creation: CDM will not
automatically create a table.l Auto creation: If the destination
database does not contain the tablespecified by Table Name, CDM willautomatically create the table. If thetable specified by Table Name alreadyexists, no table is created and data iswritten to the existing table.
l Deletion before creation: CDMdeletes the table specified by TableName, and then creates the tableagain.
Non-auto creation
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 100
Parameter Description Example Value
Table Name Destination table name.Click the icon next to the text box. Thedialog box for selecting the table isdisplayed.This parameter can be configured as amacro variable of date and time and apath name can contain multiple macrovariables. When the macro variable ofdate and time works with a scheduled job,the incremental data can be synchronizedperiodically. For details, see IncrementalSynchronization Using the MacroVariables of Date and Time.
TBL_EXAMPLE
Clear Data BeforeImport
This parameter is not displayed if AutoTable Creation is set to Deletion beforecreation.l Yes: CDM will delete data in the tables
who share the same names with thetables in the source database.
l No: Data is appended to the existingtables.
Yes
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 101
NOTE
1. When Hive serves as the migration destination, the storage format selected during table creation willbe automatically used, such as ORC and Parquet.
2. When Hive serves as the migration destination, if the storage format is TEXTFILE, delimiters mustbe explicitly specified in the statement for creating Hive tables. The following gives an example.CREATE TABLE csv_tbl(smallint_value smallint,tinyint_value tinyint,int_value int,bigint_value bigint,float_value float,double_value double,decimal_value decimal(9, 7),timestmamp_value timestamp,date_value date,varchar_value varchar(100),string_value string,char_value char(20),boolean_value boolean,binary_value binary,varchar_null varchar(100),string_null string,char_null char(20),int_null int)ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'WITH SERDEPROPERTIES ("separatorChar" = "\t","quoteChar" = "'","escapeChar" = "\\")STORED AS TEXTFILE;
5.3.5 To FTP/SFTP/NASIf the destination link of a job is Link to an FTP or SFTP Server or Link to a NAS Server,configure the destination job parameters based on Table 5-22.
Advanced attributes are optional and not displayed by default. You can click Show AdvancedAttributes to display them.
Table 5-22 Parameter description
Category Parameter Description ExampleValue
Basicparameters
Write Directory Directory to which data is to bewritten.This parameter can be configuredas a macro variable of date and timeand a path name can containmultiple macro variables. When themacro variable of date and timeworks with a scheduled job, theincremental data can besynchronized periodically. Fordetails, see IncrementalSynchronization Using the MacroVariables of Date and Time.
/opt/ftp/
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 102
Category Parameter Description ExampleValue
File Format Format in which data is written.The options are as follows:l CSV: Data is written in CSV
format, which is applicable tomigrating data tables to files.
l Binary: Files will be directlytransferred without resolution.CDM writes the files withoutchanging the original fileformat, which is applicable tothe migration of files to files.
If data is migrated between file-related data sources, such as FTP,SFTP, NAS, HDFS, and OBS, thevalue of File Format must thesame as the source file format.
CSV
Duplicate FileProcessing Method
Files with the same name and sizeare identified as duplicate files. Ifthere are duplicate files during datawriting, the following methods areavailable:l Replacel Skipl Stop job
Skip
Advancedattributes
Line Separator Lind feed character in the file. Bydefault, the system automaticallyidentifies \n, \r, and \r\n. Thisparameter is invalid when FileFormat is set to Binary.
\n
Field Delimiter Field delimiter in the file. Thisparameter is invalid when FileFormat is set to Binary.
,
File Size This parameter is displayed onlywhen the migration source is adatabase. Files are partitioned asmultiple files by size so that theycan be exported in proper size. Theunit is MB.
1024
Encoding Type Encoding type, for example,UTF_8 or GBK. This parameter isinvalid when File Format is set toBinary.
GBK
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 103
Category Parameter Description ExampleValue
Use QuoteCharacter
This parameter is displayed onlywhen File Format is CSV. It isused when database tables aremigrated to file systems.If you set this parameter to Yes anda field in the source data tablecontains a field delimiter or lineseparator, CDM uses doublequotation marks (") as the quotecharacter to quote the field contentas a whole to prevent a fielddelimiter from dividing a field intotwo fields, or a line separator fromdividing a field into different lines.For example, if the hello,worldfield in the database is quoted, itwill be exported to the CSV file asa whole.
No
Write to TemporaryFile
This parameter is displayed onlywhen the migration source is a filesystem (OBS/FTP/SFTP/NAS/HDFS), the migration destination isFTP/SFP/NAS, and File Format isBinary.The binary file is written to a .tmpfile first. After the migration issuccessful, run the rename or movecommand at the migrationdestination to restore the file.
No
Generate MD5Hash Value
This parameter is displayed onlywhen the migration source is a filesystem (OBS/FTP/SFTP/NAS/HDFS), the migration destination isFTP/SFP/NAS, and File Format isBinary.An MD5 hash value is generatedfor each transferred file, and thevalue is recorded in a new .md5file. You can specify the directorywhere the MD5 value is generated.
No
Directory of MD5Hash Value
Directory for storing MD5 values /md5
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 104
Category Parameter Description ExampleValue
Job Success MarkerFile
Whether to generate a marker filewith a custom name in thedestination directory after a job isexecuted successfully. If you do notspecify a file name, this function isdisabled by default.
finish.txt
5.3.6 To a Relational DatabaseWhen the destination link of a job is the Link to Relational Databases, that is, when data isimported to the following data sources, configure the destination job parameters based onTable 5-23.
l Data Warehouse Servicel RDS for MySQLl RDS for SQL Serverl RDS for PostgreSQLl DDMl FusionInsight LibrAl MySQL
Table 5-23 Parameter description
Parameter Description Example Value
Schema/Tablespace Name of the database to which data is tobe written. The schema can beautomatically created. Click the icon nextto the text box to select a schema ortablespace.
SCHEMA_EXAMPLE
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 105
Parameter Description Example Value
Auto Table Creation This parameter is displayed only whenboth the migration source and destinationare relational databases. The options areas follows:l Non-auto creation: CDM will not
automatically create a table.l Auto creation: If the destination
database does not contain the tablespecified by Table Name, CDM willautomatically create the table. If thetable specified by Table Namealready exists, no table is created anddata is written to the existing table.
l Deletion before creation: CDMdeletes the table specified by TableName, and then creates the tableagain.
Non-auto creation
Table Name Name of the table to which data is to bewritten. Click the icon next to the textbox. The dialog box for selecting thetable is displayed.This parameter can be configured as amacro variable of date and time and apath name can contain multiple macrovariables. When the macro variable ofdate and time works with a scheduledjob, the incremental data can besynchronized periodically. For details,see Incremental Synchronization Usingthe Macro Variables of Date and Time.
TABLE_EXAMPLE
Compress Data Whether to compress data when data isimported to DWS and Auto creation isselected
No
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 106
Parameter Description Example Value
Storage Mode When data is imported to DWS and AutoCreation is selected, you can specify thedata storage mode:l Row-based: Row-based storage. It is
applicable to point queries (index-based simple queries with fewerreturn records), or the scenario thatrequires a large number of addition,deletion, and modification operations.
l Column-based: Column-basedstorage. It is applicable to statisticalanalysis queries (group and joinscenarios) or ad hoc queries (querycondition columns and row storeindexes are uncertain).
Row-based
Clear Data Before Import This parameter is not displayed if AutoTable Creation is set to Deletion beforecreation.l Yes: CDM will delete data in the
tables who share the same names withthe tables in the source database.
l No: Data is appended to the existingtables.
Yes
Import to Staging Table If you set this parameter to Yes, thetransaction mode is enabled. CDMautomatically creates a temporary tableand imports data to the temporary table.After the data is imported successfully, itis migrated to the destination table intransaction mode. If the import fails, thedestination table is rolled back to thestate before the job starts. For details, seeMigration in Transaction Mode.The default value is No, indicating thatCDM directly imports the data to thedestination table. In this case, if the jobfails to be executed, the data that hasbeen imported to the destination tablewill not be rolled back automatically.NOTE
If you set Clear Data Before Import to Yes,CDM does not roll back the deleted data evenin transaction mode.
No
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 107
Parameter Description Example Value
Extend Field Length When Auto creation is selected, thelength of the character fields can beextended to three times the originallength and then written to the destinationtable. If the encoding types of the sourceand destination databases are different,but the character fields in the source anddestination tables are the same, errorsmay occur during data migration due tocharacter length difference.When a character field containingChinese characters is imported to DWS,the length of the character field must beautomatically increased by three times.If a job fails to be executed and an errormessage similar to value too long fortype character varying exists in the logwhen you import Chinese characters toDWS, you can enable this function tosolve the problem.NOTE
When this function is enabled, some fieldsconsume three times the storage space of theuser.
No
5.3.7 To DDSWhen the destination link of a job is the Link to MongoDB/DDS, that is, when data isimported to DDS, configure the destination job parameters based on Table 5-24.
Table 5-24 Parameter description
Parameter Description Example Value
Database Name Database to which data is to be imported mongodb
Collection Name Collection of data to be imported, which issimilar to the table name of a relationaldatabase. Click the icon next to the text box togo to the page for selecting the table or directlyenter a table name.If the desired table is not displayed, checkwhether the table exists or whether the loginaccount has the permission to query metadata.
COLLECTION_NAME
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 108
5.3.8 To DCSWhen the destination link of a job is the Link to Redis/DCS, that is, when data is imported toDCS, configure the destination job parameters based on Table 5-25.
Table 5-25 Parameter description
Parameter Description Example Value
Redis Key Prefix Key prefix, which is similar to the tablename of a relational database
TABLENAME
Value Storage Type The options are as follows:l String: without column name, such as
value1,value2l Hash: with column name, such as
column1=value1,column2=value2
String
Key Delimiter Character used to separate table names andcolumn names of a relational database
_
Value Delimiter Character used to separate columns whenthe storage type is string
;
5.3.9 To Elasticsearch/Cloud Search ServiceWhen the destination link of a job is the Link to Elasticsearch, that is, when data is importedto Elasticsearch or Cloud Search Service, configure the destination job parameters based onTable 5-26.
Table 5-26 Parameter description
Parameter Description Example Value
Index Elasticsearch index, which is similar to thename of a relational database. CDMsupports automatic creation of indexes andfield types. The index and field type namescan contain only lowercase letters.
index
Type Elasticsearch type, which is similar to thetable name of a relational database. Thetype name can contain only lowercaseletters.
type
Pipeline ID Pipeline used to convert the data formatafter data is transferred to Elasticsearch.Pipeline IDs are ready for use after beingcreated in Kibana.
my_pipeline_id
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 109
5.3.10 To DLIWhen the destination link of a job is the Link to DLI, that is, when data is imported to DLI,configure the destination job parameters based on Table 5-27.
Table 5-27 Parameter description
Parameter Description Example Value
Resource Queue Resource queue to which the destinationtable belongs
cdm
Database Name Name of the database to which data is to bewritten
dli
Table Name Name of the table to which data is to bewritten
car_detail
Clear Data BeforeImport
Whether to clear data in the destinationtable before data import
No
5.4 Scheduling Job ExecutionCDM supports scheduled execution of table/file migration jobs by minute, hour, day, week,and month. This section describes how to configure scheduled job parameters.
Scheduling Job Execution by Minute
CDM supports job execution every several minutes. See Figure 5-7.
l Start Time: indicates the time when the scheduled configuration takes effect, or the firsttime when the job is automatically executed.
l Cycle (minutes): indicates the interval when a job is executed starting from the starttime.
l End Time: This parameter is optional. If it is not set, the scheduled job keeps beingautomatically executed. If it is set, the scheduled job will be automatically stopped at theend time.
Figure 5-7 Scheduling job execution by minute
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 110
Figure 5-7 shows that the job will be automatically executed at 15:30:30 on November 29,2018 for the first time at a cycle of 30 minutes, and will be automatically stopped at 15:29:00on November 30, 2018.
Scheduling Job Execution by HourCDM supports job execution every several hours. See Figure 5-8.l Cycle (hours): indicates the interval when a job is automatically executed.l Trigger Time (minute): indicates the exact time in each hour when a scheduled task is
triggered. The value ranges from 0 to 59. You can set a maximum of 60 values and usecommas (,) to separate these values. However, the values must be unique.If the trigger time is not within the validity period, the system selects a trigger timeclosest to the validity period for the scheduled job to be automatically executed at thefirst time. The following gives an example:– Start Time: 1:20:00– Cycle (hours): 3– Trigger Time (minute): 10Figure 5-8 shows that the first automatic execution time is 2:10:00, and the secondautomatic execution time is 5:10:00.
Figure 5-8 Trigger time beyond the validity period
l Validity Period: includes Start Time and End Time.– Start Time: indicates the time when the scheduled configuration takes effect.– End Time: This parameter is optional, which indicates the time when the scheduled
job is automatically stopped. If this parameter is not set, the scheduled job keepsbeing automatically executed.
Figure 5-9 Scheduling job execution by hour
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 111
Figure 5-9 shows that the scheduled configuration will take effect at 15:30:00 on November30, 2018. The job is automatically executed for the first time upon the scheduledconfiguration takes effect, at 15:50:00 for the second time, and at 17:10:00 for the third time.The job is triggered for three times every 2 hours and the configuration is always valid.
Scheduling Job Execution by DayCDM supports job execution every several days. See Figure 5-10.l Cycle (days): indicates the interval when a job is executed starting from the start time.l Validity Period: includes Start Time and End Time.
– Start Time: indicates the time when the scheduled configuration takes effect, or thefirst time when the job is automatically executed.
– End Time: This parameter is optional, which indicates the time when the scheduledjob is automatically stopped. If this parameter is not set, the scheduled job keepsbeing automatically executed.
Figure 5-10 Scheduling job execution by day
Figure 5-10 shows that the scheduled job will be automatically executed at 00:20:00 onDecember 1, 2018, and is executed once every three days. The configuration is always valid.
Scheduling Job Execution by WeekCDM supports job execution every several weeks, as shown in Figure 5-11.l Cycle (weeks): indicates the interval when a scheduled job is executed starting from the
start time.l Trigger Time (day): You can specify the day of each week when the job is automatically
executed. One or more days can be selected at a time.l Validity Period: includes Start Time and End Time.
– Start Time: indicates the time when the scheduled configuration takes effect.– End Time: This parameter is optional, which indicates the time when the scheduled
job is automatically stopped. If this parameter is not set, the scheduled job keepsbeing automatically executed.
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 112
Figure 5-11 Scheduling job execution by week
Figure 5-11 shows that the job will be automatically executed at 00:20:00 every Tuesday,Saturday, and Sunday every two weeks starting from 00:20:00 on December 1, 2018, and thejob will be automatically stopped at 00:00:00 on June 1, 2019.
Scheduling Job Execution by MonthCDM supports job execution every several months, as shown in Figure 5-12.l Cycle (months): indicates the interval when a scheduled job is executed starting from
the start time.l Trigger Time (day): indicates the day of each month when the job is executed. The
value ranges from 1 to 31. You can set multiple values and use commas (,) to separatethese values. However, the values must be unique.
l Validity Period: includes Start Time and End Time.– Start Time: indicates the time when the scheduled configuration takes effect. The
automatic execution time is accurate to hour, minute, and second.– End Time: This parameter is optional, which indicates the time when the scheduled
job is automatically stopped. If this parameter is not set, the scheduled job keepsbeing automatically executed.
Figure 5-12 Scheduling job execution by month
Figure 5-12 shows that the job will be automatically executed at 00:00:00 on every fifth andtwenty-fifth day of each month starting from 00:00:00 on December 1, 2018. Theconfiguration is always valid.
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 113
5.5 Managing a Single Job
Scenario
This section describes how to manage a single CDM table/file migration job. The followingoperations are involved:l Modify the job parameters.l Run the job.l Stop the job.l View historical records.l View the job JSON.l Edit the job JSON.l Delete the job.l Query the job statistics.l Stop incremental migration.l Continue incremental migration.
Procedure
Step 1 Log in to the CDM management console.
Step 2 In the left navigation pane, click Cluster Management. Locate the target cluster and clickJob Management.
Step 3 Click Table/File Migration. The job list is displayed. You can perform the followingoperations on a single job:l Modify the job parameters: Click Edit in the Operation column to modify the job
parameters.l Run the job: Click Run in the Operation column to manually start the job.l View the historical records: Click Historical Record in the Operation column. On the
Historical Record page that is displayed, view the job's historical execution records andread/write statistics. Click Log to view the log information about the job.
l Delete the job: Choose More > Delete in the Operation column to delete the job.l Stop the job: Choose More > Stop in the Operation column to stop the job.l View the job JSON: Choose More > View Job JSON in the Operation column to view
the job JSON.l Edit the job JSON: Choose More > Edit Job JSON in the Operation column to edit the
job JSON files, which is similar to modify the job parameters.l Query the job statistics: Choose More > Query Job Statistics in the Operation column
to open the preview window of a configured database job. A maximum of 1,000 datarecords can be previewed. By comparing the number of data records of the migrationsource and destination, you can check whether the migration is successful and whetherdata is lost.
l Continue incremental migration: More > Continue Incremental Migration in theOperation column. If the job is to migrate a single table from MySQL to DWS and
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 114
Migrate Incremental Data is set to Yes, you can continue incremental migration byperforming the proceeding operations.
l Stop incremental migration: More > Stop Incremental Migration in the Operationcolumn. If the job is to migrate a single table from MySQL to DWS and MigrateIncremental Data is set to Yes, you can stop incremental migration by performing theproceeding operations.
Step 4 After the modification, click Save or Save and Run.
----End
5.6 Batch Managing Jobs
Scenario
This section describes how to batch manage CDM table/file migration jobs. The followingoperations are involved:l Batch run jobs.l Batch delete jobs.l Batch export jobs.l Batch import jobs.
You can batch export and import jobs in the following scenarios:l Job migration between CDM clusters: You can migrate jobs from a cluster of the earlier
version to a new version.l Job backup: You can stop or delete CDM clusters to reduce costs. In this case, you can
batch export the job scripts and save them, and create a cluster and import the job scriptsif necessary.
l Batch job creation: You can manually create a job and export the job configuration file inJSON format. Copy the content in the JSON file to the same file or new files, and thenimport the file/files to CDM to batch create jobs.
Procedure
Step 1 Log in to the CDM management console.
Step 2 In the left navigation pane, click Cluster Management. Locate the target cluster and clickJob Management.
Step 3 Click Table/File Migration. The job list is displayed. You can perform the following batchoperations:l Batch run jobs.
After selecting one or more jobs, click Run to batch start these jobs.l Batch delete jobs.
After selecting one or more jobs, click Delete to batch delete these jobs.l Batch export jobs.
Click Export All to export all jobs in JSON format. These files can be used as backupsor imported to another cluster.
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 115
Currently, you cannot select specific jobs to export but can only export all jobs at a time.For security purposes, the link passwords are not exported when CDM export the jobsand are replaced with Add password here.
l Batch import jobs.Click Import and select the import format (text file or JSON).– By JSON string: Job files to be imported must be in JSON format. If the job files
to be imported are exported from CDM, edit the JSON files before importing themto CDM. Replace Add password here with the correct link passwords.
– By text file: This mode can be used when the local JSON files cannot be uploadedproperly. Paste the JSON strings of the jobs to the text box.
----End
Cloud Data MigrationUser Guide 5 Job Management
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 116
6 Typical Scenarios
6.1 Migrating Data from DDS to DWS
Scenario
CDM allows you to migrate data from DDS to other data sources. This section describes howto use CDM to migrate data from DDS to DWS. The procedure is as follows:
1. Creating a CDM Cluster and Binding an EIP to the Cluster2. Creating a DDS Link3. Creating a DWS Link4. Creating a Migration Job
Prerequisitesl You have purchased DWS and DDS.l You have obtained the IP address, port number, database name, username, and password
for connecting to the DWS and DDS databases. In addition, you must have the read,write, and delete permissions on the DDS and DWS databases.
Creating a CDM Cluster and Binding an EIP to the Cluster
Step 1 Log in to the CDM management console and create a CDM cluster. For details about how tocreate a cluster, see Creating a Cluster. The key configurations are as follows:l The flavor of the CDM cluster is selected based on the amount of data to be migrated.
Generally, select cdm.medium to meet the requirements of most migration scenarios.l If DDS and DWS belong to the same VPC, the newly created CDM cluster also needs to
be placed in the VPC without binding an EIP. The CDM cluster's subnet and securitygroup can be the same as those of the DDS or DWS cluster. You can also configure thesecurity group rule to enable the CDM cluster to access the cluster of another service(DWS or DDS).
l If DDS and DWS are not in the same VPC, the newly created CDM cluster needs to bein the same VPC as DDS and an EIP must be bound for the CDM cluster to access theDWS cluster.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 117
Step 2 After the CDM cluster is created, on the Cluster Management page, click Bind Elastic IP inthe Operation column to bind an EIP to the cluster. The CDM cluster uses the EIP to accessDWS. If DDS and DWS are in the same VPC, do not bind an EIP to the CDM cluster.
----End
Creating a DDS Link
Step 1 Click Job Management in the Operation column of the CDM cluster. On the page that isdisplayed, choose Link Management > Create Link. The page for selecting a connector isdisplayed. See Figure 6-1.
Figure 6-1 Selecting a connector
Step 2 To create a DDS link, select Document Database Service, and click Next to configure thelink parameters based on Table 6-1.
Table 6-1 DDS link parameters
Parameter Description Example Value
Name Link name, which can be defined basedon the data source type for easymemorization
mongo_link
MongoDB ServerList
Address list of the DDS cluster. Theformat is IP address or domain name ofthe database server:port number.Separate multiple server lists bysemicolons (;).
192.168.0.1:7300;192.168.0.2:7301
Database Name Name of the DDS database to beconnected
DB_mongodb
Username Username for logging in to the DDSdatabase
cdm
Password Password for logging in to the DDSdatabase
-
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 118
Step 3 Click Save. The Link Management page is displayed.
----End
Creating a DWS Link
Step 1 On the Link Management tab page, click Create Link and select Data Warehouse Serviceto create a DWS link.
Step 2 Click Next. The page for configuring the DWS link parameters is displayed. Configure themandatory parameters according to Table 6-2 and retain the default values of the optionalparameters.
Table 6-2 DWS link parameters
Parameter Description Example Value
Name Unique link name dwslink
Database Server IP address or domain name of the DWSdatabase server
192.168.0.3
Port DWS database port 8000
Database Name Name of the DWS database db_demo
Username User who has the read, write, and deletepermissions on the DWS database
dbadmin
Password Password of the user -
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 119
Parameter Description Example Value
Import Mode When creating a DWS link, select thedata import mode.l Copy: Migrate the source data to the
DWS management node and thencopy the data to DataNodes. Toaccess DWS through the Internet,select Copy.
l GDS: DataNodes of DWSconcurrently request data from theGDS component of CDM and thenwrite data to DWS. The GDS modecannot be used for data export fromDWS.
Theoretically, the GDS mode is moreefficient than the Copy mode. However,when the GDS mode is used, thefollowing configurations are required:1. Configure DWS to allow users of the
DWS link to create and deleteforeign tables.
2. Configure the security group wherethe CDM cluster resides to allow theDWS DataNodes to access port25000 of the internal IP address ofthe CDM cluster.
Copy
Step 3 Click Save. The link is successfully created.
----End
Creating a Migration Job
Step 1 Choose Table/File Migration > Create Job to create a data migration job. Figure 6-2illustrates how to create a migration job.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 120
Figure 6-2 Creating a job
Step 2 Configure the required job information:l Job Name: Enter a unique job name.l Source Job Configuration
– Source Link Name: Select the mongo_link link created in Creating a DDS Link.– Database Name: Select the database whose data is to be migrated.– Collection Name: Enter the name of the MongoDB collection on DDS, which is
similar to the table name in a relational database.l Destination Job Configuration
– Destination Link Name: Select the dwslink link created in Creating a DWSLink.
– Schema/Tablespace: Select the DWS database to which data is to be written.– Table Name: Name of the table to which data is to be written. You can manually
enter a table name that does not exist. CDM automatically creates the table onDWS.
– Clear Data Before Import: Choose whether to clear data in the destination tablebefore data import.
Step 3 Click Next. The Map Field page is displayed. See Figure 6-3. CDM automatically mapstable fields at the migration source and destination. Check whether the field mapping iscorrect.l If the field mapping is incorrect, click the row where the field is located and drag the
field to adjust the mapping.l You need to manually select the distribution columns of DWS. You are advised to select
the distribution columns according to the following principles:
a. Use the primary key as the distribution column.b. If multiple data segments are combined as primary keys, specify all primary keys as
the distribution column.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 121
c. In the scenario where no primary key is available, if no distribution column isselected, DWS uses the first column as the distribution column by default. As aresult, data skew risks exist.
l If you need to convert the content of the source fields, perform the operations describedin Field Conversion During Migration. In this example, the field conversion is notrequired.
Figure 6-3 Field mapping
Step 4 Click Next to set task parameters. Generally, retain the default values of all parameters.
In this step, you can configure the following optional functions:l Retry upon Failure: If the job fails to be executed, you can determine whether to
automatically retry. Retain the default value Never.l Schedule Execution: To configure scheduled jobs, see Scheduling Job Execution.
Retain the default value No.l Concurrent Extractors: Enter the number of extractors to be concurrently executed.
Retain the default value 1.l Write Dirty Data: Specify this parameter if data that fails to be processed or filtered out
during job execution needs to be written to OBS for future viewing. Before writing dirtydata, create an OBS link. Retain the default value No so that dirty data is not recorded.
l Delete Job After Completion: Retain the default value Do not delete.
Step 5 Click Save and Run. The Job Management page is displayed, on which you can view thejob execution progress and result.
Step 6 After the job is successfully executed, in the Operation column of the job, click HistoricalRecord to view the job's historical execution records and read/write statistics.
On the Historical Record page, click Log to view the job log.
----End
6.2 Periodically Backing Up FTP/SFTP Files to HUAWEICLOUD OBS
ScenarioCDM can automatically upload new files to OBS periodically. You do not need to compilecode or manually upload the files frequently. You can also use the massive storage capabilitiesof OBS on HUAWEI CLOUD to back up files.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 122
This section describes how to periodically back up FTP files to OBS with CDM.
For example, the to_obs_test directory on the FTP server contains one subdirectoryanother_dir and two files file1 and file2. file2 is in the another_dir directory. Figure 6-4shows the files. Configure a scheduled job of CDM to transfer these files to OBS and addfile3 and file4 to the directory to verify that CDM can periodically transfer new files to OBS.
Figure 6-4 Files on the FTP server
Prerequisitesl You have sufficient EIP quota.l You have created an OBS bucket and obtained the access key (AK and SK).l You have obtained the IP address, username, and password of the FTP server.l If the FTP server is in the on-premises environment, ensure that the FTP server can
access HUAWEI CLOUD through the public network, or the VPN or Direct Connectbetween the on-premises data center and HUAWEI CLOUD has been established.
Creating a CDM Cluster and Binding an EIP to the Cluster
Step 1 Log in to the CDM management console and click Buy CDM to create a CDM cluster. Thekey configurations are as follows:l Select the cdm.medium instance, which is applicable to most migration scenarios.l If the cluster is used only to migrate data from third-party data sources to OBS, there is
no requirements on the VPC, subnet, and security group of the CDM cluster. You canspecify them based on your needs.
Step 2 After the CDM cluster is created, on the Cluster Management page, click Bind Elastic IP inthe Operation column to bind an EIP to the cluster. The CDM cluster uses the EIP to accessthe on-premises FTP server.
----End
Creating an OBS Link
Step 1 Click Job Management in the Operation column of the CDM cluster. On the page that isdisplayed, choose Link Management > Create Link. The page for selecting a connector isdisplayed. See Figure 6-5.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 123
Figure 6-5 Selecting a connector
Step 2 Select Object Storage Service and click Next to configure the OBS link parameters. SeeFigure 6-6.l Name: Enter a custom link name, for example, obslink.l OBS Server and Port: Enter the actual OBS address information.l AK and SK: Enter the AK and SK used for logging in to OBS.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 124
Figure 6-6 Creating an OBS link
Step 3 Click Save. The Link Management page is displayed.
----End
Creating an FTP Link
Step 1 On the Link Management tab page, click Create Link. On the page that is displayed, selectFTP, click Next, and configure the FTP link parameters.l Name: Enter a custom link name, for example, ftplink.l Host Name/IP Address and Port: Enter the address information about the FTP server.l Username and Password: Enter the username and password used for logging in to the
FTP server.
Step 2 Click Save. The Link Management page is displayed.
----End
Creating a Scheduled Migration Job
Step 1 Choose Table/File Migration > Create Job to create a data migration job. See Figure 6-7.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 125
Figure 6-7 Creating a migration job
l Job Name: Enter a custom job name.l Source Link Name: Select the ftplink link created in Creating an FTP Link.
– Source Directory/File: Select the path where to_obs_test is located.– File Format: Select Binary to transfer files without parsing the content. This
parameter must be consistent on both the migration source and destination.l Destination Link Name: Select the obslink link created in Creating an OBS Link.
– Bucket Name: Select the OBS bucket for storing FTP files.– Write Directory: Select an existing directory or manually enter one. If the entered
directory does not exist, CDM automatically creates one, for example, /to/ftp2obs/.– File Format: Select Binary. The value must be the same as that on the migration
source.– Duplicate File Processing Method: Select Skip to avoid transferring duplicate
files.
Step 2 Click Next and configure the scheduled task. In this example, the scheduled task is executedevery 10 minutes. See Figure 6-8. Retain the default values of other parameters.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 126
Figure 6-8 Scheduling job execution
Step 3 Click Save and Run.
----End
Verifying Backup
Step 1 After the job is executed successfully, log in to the OBS client. You can see that thecorresponding files exist on OBS. Figure 6-9 shows the files on OBS.
Figure 6-9 Files on the OBS client
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 127
Step 2 In the FTP server directories, add files file3 and file4. file3 and file1 are in the same directory,and file2 and file4 are in the same directory. See Figure 6-10.
Figure 6-10 New files on the FTP server
Step 3 Wait 10 minutes and CDM automatically triggers the scheduled job. Then you can view thenew files file3 and file4 after logging in to OBS. Figure 6-11 shows the new files on OBS.
Figure 6-11 New files on OBS
Step 4 On the Job Management page, click Historical Record in the Operation column to viewthe job's historical execution records and read/write statistics.
Step 5 Click Log to view the job log.
----End
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 128
6.3 Migrating Data from OSS to OBS
ScenarioCDM allows you to directly migrate object storage data from a third-party cloud to HUAWEICLOUD OBS without forwarding or writing code.
This section describes how to use CDM to migrate data from Alibaba Cloud OSS toHUAWEI CLOUD OBS. The procedure is as follows:
1. Creating a CDM Cluster and Binding an EIP to the Cluster2. Creating an OBS Link3. Creating an OSS Link4. Creating a Migration Job
Preparing Datal Domain name for accessing OSS, for example, oss-cn-hangzhou.aliyuncs.coml AK, temporary credential, or security token for accessing OSSl Domain name, port number, AK, and SK for accessing OBS
Creating a CDM Cluster and Binding an EIP to the Cluster
Step 1 Log in to the CDM management console and create a CDM cluster. For details about how tocreate a cluster, see Creating a Cluster. The key configurations are as follows:l Select the cdm.medium instance, which is applicable to most migration scenarios.l If the cluster is used only to migrate data from third-party data sources to OBS, there is
no special requirements on the VPC, subnet, and security group of the CDM cluster. Youcan specify them based on your needs.
Step 2 After the CDM cluster is created, on the Cluster Management page, click Bind Elastic IP inthe Operation column to bind an EIP to the cluster. The CDM cluster accesses Alibaba CloudOSS through the public network.
Because data is imported to HUAWEI CLOUD, 5 Mbit/s bandwidth for the EIP is enough.
----End
Creating an OBS Link
Step 1 Click Job Management in the Operation column of the CDM cluster. On the page that isdisplayed, choose Link Management > Create Link. The page for selecting a connector isdisplayed. See Figure 6-12.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 129
Figure 6-12 Selecting a connector
Step 2 Select Object Storage Service and click Next to configure the OBS link parameters. SeeFigure 6-13.l Name: Enter a custom link name, for example, obslink.l OBS Server and Port: Enter the actual OBS address information.l AK and SK: Enter the AK and SK used for logging in to OBS.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 130
Figure 6-13 Creating an OBS link
Step 3 Click Save. The Link Management page is displayed.
----End
Creating an OSS Link
Step 1 On the Link Management tab page, click Create Link. On the page that is displayed, selectAlibaba Cloud OSS, click Next, and configure the required link parameters. See Figure6-14.l Name: Enter a custom link name.l OSS Endpoint: Enter the access domain name of the data to be migrated.l Authentication Method: Select an authentication method based on your needs, for
example, Access key.l AK and SK: Enter the AK and SK used for logging in to OSS.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 131
Figure 6-14 Creating an OSS link
Step 2 Click Save. The Link Management page is displayed.
----End
Creating a Migration Job
Step 1 Choose Table/File Migration > Create Job to create a job for migrating data from OSS toOBS. See Figure 6-15.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 132
Figure 6-15 Creating a job
l Job Name: Enter a custom job name.l Source Job Configuration
– Source Link Name: Select the osslink link created in Creating an OSS Link.– Bucket Name: Select the bucket from which the data is to be migrated.– Source Directory/File: Set this parameter to the path of the data to be migrated.
You can migrate all files in the bucket.– File Format: Select Binary, which delivers the optimal performance and rate of
transferring files to files.– Retain the default values of the optional parameters in Show Advanced Attributes.
For details, see From OBS/OSS.l Destination Job Configuration
– Destination Link Name: Select the obslink link created in Creating an OBSLink.
– Bucket Name: Select the bucket to which data is to be written.– Write Directory: Select the path for storing data.– File Format: Select Binary. The value must be the same as that on the migration
source.– Retain the default values of other optional parameters. For details, see To OBS.
Step 2 Click Next to set task parameters. Generally, retain the default values of all parameters.
In this step, you can configure the following optional functions:l Retry upon Failure: If the job fails to be executed, you can determine whether to
automatically retry. Retain the default value Never.l Schedule Execution: To configure scheduled jobs, see Scheduling Job Execution.
Retain the default value No.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 133
l Concurrent Extractors: Enter the number of extractors to be concurrently executed.Retain the default value 1.
l Write Dirty Data: Specify this parameter if data that fails to be processed or filtered outduring job execution needs to be written to OBS for future viewing. Before writing dirtydata, create an OBS link. Retain the default value No so that dirty data is not recorded.
l Delete Job After Completion: Retain the default value Do not delete.
Step 3 Click Save and Run. The Job Management page is displayed, on which you can view thejob execution progress and result.
Step 4 After the job is successfully executed, in the Operation column of the job, click HistoricalRecord to view the job's historical execution records and read/write statistics.
On the Historical Record page, click Log to view the job log.
----End
6.4 Migrating Data from On-premises Redis to DCS
Scenario
CDM can migrate data from the on-premises Redis database or third-party Redis service toDCS on HUAWEI CLOUD without programming. The procedure is as follows:
1. Creating a CDM Cluster and Binding an EIP to the Cluster2. Creating the Redis and DCS Links3. Creating a Migration Job
Prerequisitesl You have sufficient EIP quota.
l You have subscribed to DCS and obtained the IP address, port number, and password ofthe DCS database.
l The on-premises Redis database can be accessed through the public network. You canconfigure port mapping or port forwarding to enable public network access. For details,see How Do I Connect On-premises Intranet or Third-Party Private Network toCDM.
l You have obtained the IP address and password of the Redis server.
Creating a CDM Cluster and Binding an EIP to the Cluster
Step 1 Log in to the CDM management console and create a CDM cluster. For details about how tocreate a cluster, see Creating a Cluster. The key configurations are as follows:
l The flavor of the CDM cluster is selected based on the amount of data to be migrated.Generally, select cdm.medium to meet the requirements of most migration scenarios.
l The CDM and DCS clusters must be in the same VPC. In addition, it is recommendedthat the CDM cluster be in the same subnet and security group as the DCS cluster.
l If the subnets and security groups are inconsistent, configure a security group rule toenable the CDM cluster to access the DCS cluster.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 134
Step 2 After the CDM cluster is created, click Bind Elastic IP on the Cluster Management page tobind an EIP to the cluster. The CDM cluster uses the EIP to access the on-premises Redis datasource.
----End
Creating the Redis and DCS Links
Step 1 Click Job Management in the Operation column of the CDM cluster. On the page that isdisplayed, choose Link Management > Create Link. The page for selecting a connector isdisplayed. See Figure 6-16.
Figure 6-16 Selecting a connector
Step 2 Select Redis and click Next. On the page that is displayed, configure the Redis linkparameters. See Figure 6-17.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 135
Figure 6-17 Creating a Redis link
l Name: Enter a custom link name, for example, redis_link.l Redis Deployment Method: Select the value based on the actual deployment method of
the on-premises Redis database.– If it is deployed in single-node mode, select Single.– If it is deployed in cluster mode, select Cluster.
l Redis Server List: Set this parameter to the server address of the on-premises Redisdatabase. Separate multiple server lists by semicolons (;).
l Password and Redis Database Index: Enter the password used for logging in to the on-premises Redis database and the index of the data to be exported.
Step 3 Click Save. The Link Management page is displayed.
Step 4 On the Link Management tab page, click Create Link to create a DCS link. The procedurefor creating a DCS link and the link parameters are the same as those of the Redis link.
Step 5 Select Distributed Cache Service, click Next, and configure the DCS link parameters.l Name: Enter a custom link name, for example, dcs_link.l Redis Deployment Method: Select the value based on the DCS cluster deployment
mode.l Redis Server List: Set this parameter to the server address lists of the DCS database.
Separate multiple server lists by semicolons (;).l Password and Redis Database Index: Enter the password used for logging in to the
DCS database and the index of the data to be exported.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 136
Step 6 Click Save. The Link Management page is displayed.
----End
Creating a Migration Job
Step 1 Choose Table/File Migration > Create Job to create a data migration job. See Figure 6-18.
Figure 6-18 Creating a migration job
l Job Name: Enter a unique name, for example, redis2dcs.l Source Job Configuration
– Source Link Name: Select the Redis link created in Creating the Redis and DCSLinks.
– Redis Key Prefix: Select a key prefix from which data is to be exported.– Value Storage Type: Select a value based on your needs. The following uses Hash
on both the migration source and destination as an example.– Retain the default values of the optional parameters in Show Advanced Attributes.
For details, see From Redis.l Destination Job Configuration
– Destination Link Name: Select the DCS link created in Creating the Redis andDCS Links.
– Redis Key Prefix: Select a key prefix to which data is to be imported.– Value Storage Type: Select Hash, which is the same as the migration source.– Retain the default values of the optional parameters in Show Advanced Attributes.
For details, see To DCS.
Step 2 After the basic job parameters are configured, click Next to enter the page for configuringfield mapping. See Figure 6-19.
For the hash type, you can click to copy the fields on the migration source and then selectthe field that is used as the primary key.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 137
Figure 6-19 Configuring field mapping
Step 3 Click Next. On the Configure Task page that is displayed, configure Schedule Execution asrequired. See Figure 6-20.
If Schedule Execution is enabled, CDM periodically synchronizes data. If data with duplicateprimary keys exists, CDM automatically overwrites the existing primary key.
Figure 6-20 Scheduling job execution
Step 4 Click Save and Run. The Job Management page is displayed, on which you can view thejob execution progress and result.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 138
Step 5 After the job is successfully executed, in the Operation column of the job, click HistoricalRecord to view the job's historical execution records and read/write statistics.
On the Historical Record page, click Log to view the job log.
----End
6.5 Migrating Data from Oracle to Cloud Search Service
Scenario
Cloud Search Service provides users with structured and unstructured data search, statistics,and report capabilities. This section describes how to use CDM to migrate data from theOracle database to Cloud Search Service. The procedure is as follows:
1. Creating a CDM Cluster and Binding an EIP to the Cluster2. Creating a Cloud Search Service Link3. Creating an Oracle Link4. Creating a Migration Job
Prerequisitesl You have sufficient EIP quota.l You have subscribed to Cloud Search Service and obtained the IP address and port
number of the Cloud Search Service cluster.l You have obtained the IP address, name, username, and password of the Oracle database.
If the Oracle server is deployed on an on-premises data center or a third-party cloud,ensure that an IP address that can be accessed from the public network has beenconfigured for the Oracle database, or the VPN or Direct Connect between the on-premises data center and HUAWEI CLOUD has been established. To enable publicnetwork access, see How Do I Connect On-premises Intranet or Third-Party PrivateNetwork to CDM.
Creating a CDM Cluster and Binding an EIP to the Cluster
Step 1 Log in to the CDM management console and create a CDM cluster. For details about how tocreate a cluster, see Creating a Cluster. The key configurations are as follows:l The flavor of the CDM cluster is selected based on the amount of data to be migrated.
Generally, select cdm.medium to meet the requirements of most migration scenarios.l The CDM and Cloud Search Service clusters must be in the same VPC. In addition, it is
recommended that the CDM cluster be in the same subnet and security group as theCloud Search Service cluster.
l If the same subnet and security group cannot be used for security purposes, ensure that asecurity group rule has been configured to allow the CDM cluster to access the CloudSearch Service cluster.
Step 2 After the CDM cluster is created, on the Cluster Management page, click Bind Elastic IP inthe Operation column to bind an EIP to the cluster. The CDM cluster uses the EIP to accessthe Oracle data source.
----End
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 139
Creating a Cloud Search Service Link
Step 1 Click Job Management in the Operation column of the CDM cluster. On the page that isdisplayed, choose Link Management > Create Link. The page for selecting a connector isdisplayed. See Figure 6-21.
Figure 6-21 Selecting a connector
Step 2 Select Cloud Search Service and click Next. On the page that is displayed, configure theCloud Search Service link parameters. See Figure 6-22.l Name: Enter a custom link name, for example, csslink.l Elasticsearch Server and Port: Enter the address and port number of the Cloud Search
Service cluster.l Username and Password: Enter the username and password used for logging in to the
Cloud Search Service cluster. The user must have the read and write permissions on thedatabase.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 140
Figure 6-22 Creating a Cloud Search Service link
Step 3 Click Save. The Link Management page is displayed.
----End
Creating an Oracle Link
Step 1 On the Link Management tab page, click Create Link. On the page that is displayed, selectOracle, click Next, and configure the Oracle link parameters.l Name: Enter a custom link name, for example, oracle_link.l Database Server and Port: Enter the address and port number of the Oracle server.l Database Name: Enter the name of the Oracle database whose data is to be exported.l Username and Password: Enter the username and password used for logging in to the
Oracle database. The user must have the permission to read the Oracle metadata.
Step 2 Click Save. The Link Management page is displayed.
----End
Creating a Migration Job
Step 1 Choose Table/File Migration > Create Job to create a job for exporting data from the Oracledatabase to Cloud Search Service.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 141
Figure 6-23 Creating a migration job
l Job Name: Enter a unique name.l Source Job Configuration
– Source Link Name: Select the oracle_link link created in Creating an OracleLink.
– Schema/Tablespace: Enter the name of the database whose data is to be migrated.– Table Name: Enter the name of the table to be migrated.– Retain the default values of the optional parameters in Show Advanced Attributes.
For details, see From a Relational Database.l Destination Job Configuration
– Destination Link Name: Select the csslink link created in Creating a CloudSearch Service Link.
– Index: Select the Elasticsearch index of the data to be written. You can also enter anew index. CDM automatically creates the index on Cloud Search Service.
– Type: Select the Elasticsearch type of the data to be written. You can enter a newtype. CDM automatically creates a type at the migration destination.
– Retain the default values of the optional parameters in Show Advanced Attributes.For details, see To Elasticsearch/Cloud Search Service.
Step 2 Click Next. The Map Field page is displayed. CDM automatically matches the source anddestination fields. See Figure 6-24.l If the field mapping is incorrect, you can drag the fields to adjust the mapping.l If the type is automatically created at the migration destination, you need to configure
the type and name of each field.l CDM supports field conversion during the migration. For details, see Field Conversion
During Migration.
Figure 6-24 Field mapping
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 142
Step 3 Click Next to set task parameters. Generally, retain the default values of all parameters.
In this step, you can configure the following optional functions:
l Retry upon Failure: If the job fails to be executed, you can determine whether toautomatically retry. Retain the default value Never.
l Schedule Execution: To configure scheduled jobs, see Scheduling Job Execution.Retain the default value No.
l Concurrent Extractors: Enter the number of extractors to be concurrently executed.Retain the default value 1.
l Write Dirty Data: Specify this parameter if data that fails to be processed or filtered outduring job execution needs to be written to OBS for future viewing. Before writing dirtydata, create an OBS link. Retain the default value No so that dirty data is not recorded.
l Delete Job After Completion: Retain the default value Do not delete.
Step 4 Click Save and Run. The Job Management page is displayed, on which you can view thejob execution progress and result.
Step 5 After the job is successfully executed, in the Operation column of the job, click HistoricalRecord to view the job's historical execution records and read/write statistics.
On the Historical Record page, click Log to view the job log.
----End
6.6 Migrating Data from OBS to Cloud Search Service
Scenario
CDM supports data migration between services on HUAWEI CLOUD. This section describeshow to use CDM to migrate data from OBS to Cloud Search Service. The procedure is asfollows:
1. Creating a CDM Cluster2. Creating a Cloud Search Service Link3. Creating an OBS Link4. Creating a Migration Job
Prerequisitesl You have obtained the domain name, port number, AK, and SK for accessing OBS.
l You have subscribed to Cloud Search Service and obtained the IP address and portnumber of the Cloud Search Service cluster.
Creating a CDM Cluster
Log in to the CDM management console and create a CDM cluster. For details about how tocreate a cluster, see Creating a Cluster. The key configurations are as follows:
l The flavor of the CDM cluster is selected based on the amount of data to be migrated.Generally, select cdm.medium to meet the requirements of most migration scenarios.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 143
l The CDM and Cloud Search Service clusters must be in the same VPC. In addition, it isrecommended that the CDM cluster be in the same subnet and security group as theCloud Search Service cluster.
l If the same subnet and security group cannot be used for security purposes, ensure that asecurity group rule has been configured to allow the CDM cluster to access the CloudSearch Service cluster.
Creating a Cloud Search Service Link
Step 1 Click Job Management in the Operation column of the CDM cluster. On the page that isdisplayed, choose Link Management > Create Link. The page for selecting a connector isdisplayed. See Figure 6-25.
Figure 6-25 Selecting a connector
Step 2 Select Cloud Search Service and click Next. On the page that is displayed, configure theCloud Search Service link parameters. See Figure 6-26.l Name: Enter a custom link name, for example, csslink.l Elasticsearch Server and Port: Enter the address and port number of the Cloud Search
Service cluster.l Username and Password: Enter the username and password used for logging in to the
Cloud Search Service cluster. The user must have the read and write permissions on thedatabase.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 144
Figure 6-26 Creating a Cloud Search Service link
Step 3 Click Save. The Link Management page is displayed.
----End
Creating an OBS Link
Step 1 On the Link Management tab page, click Create Link. On the page that is displayed, selectHUAWEI CLOUD OBS, click Next, and configure the required link parameters. See Figure6-27.l Name: Enter a custom link name, for example, obslink.l OBS Server and Port: Enter the actual OBS address information.l AK and SK: Enter the AK and SK used for logging in to OBS.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 145
Figure 6-27 Creating an OBS link
Step 2 Click Save. The Link Management page is displayed.
----End
Creating a Migration Job
Step 1 Choose Table/File Migration > Create Job to create a job for exporting data from OBS toCloud Search Service.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 146
Figure 6-28 Creating a migration job
l Job Name: Enter a unique name.
l Source Job Configuration
– Source Link Name: Select the obslink link created in Creating an OBS Link.
– Bucket Name: Select the bucket from which the data is to be migrated.
– Source Directory/File: Set this parameter to the path of the data to be migrated.You can migrate all directories and files in the bucket.
– File Format: Select CSV for migrating files to a data table.
– Retain the default values of the optional parameters in Show Advanced Attributes.For details, see From OBS/OSS.
l Destination Job Configuration
– Destination Link Name: Select the csslink link created in Creating a CloudSearch Service Link.
– Index: Select the Elasticsearch index of the data to be written. You can also enter anew index. CDM automatically creates the index on Cloud Search Service.
– Type: Select the Elasticsearch type of the data to be written. You can enter a newtype. CDM automatically creates a type at the migration destination.
– Retain the default values of the optional parameters in Show Advanced Attributes.For details, see To Elasticsearch/Cloud Search Service.
Step 2 Click Next. The Map Field page is displayed. CDM automatically matches the source anddestination fields. See Figure 6-29.
l If the field mapping is incorrect, you can drag the fields to adjust the mapping.
l If the type is automatically created at the migration destination, you need to configurethe type and name of each field.
l CDM supports field conversion during the migration. For details, see Field ConversionDuring Migration.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 147
Figure 6-29 Field mapping
Step 3 Click Next to set task parameters. Generally, retain the default values of all parameters.
In this step, you can configure the following optional functions:l Retry upon Failure: If the job fails to be executed, you can determine whether to
automatically retry. Retain the default value Never.l Schedule Execution: To configure scheduled jobs, see Scheduling Job Execution.
Retain the default value No.l Concurrent Extractors: Enter the number of extractors to be concurrently executed.
Retain the default value 1.l Write Dirty Data: Specify this parameter if data that fails to be processed or filtered out
during job execution needs to be written to OBS for future viewing. Before writing dirtydata, create an OBS link. Retain the default value No so that dirty data is not recorded.
l Delete Job After Completion: Retain the default value Do not delete.
Step 4 Click Save and Run. The Job Management page is displayed, on which you can view thejob execution progress and result.
Step 5 After the job is successfully executed, in the Operation column of the job, click HistoricalRecord to view the job's historical execution records and read/write statistics.
On the Historical Record page, click Log to view the job log.
----End
6.7 Migrating Data from OBS to DLI
Scenario
DLI is a fully hosted big data query service provided by HUAWEI CLOUD. This sectiondescribes how to use CDM to migrate data from OBS to DLI. The procedure is as follows:
1. Creating a CDM Cluster2. Creating a DLI Link3. Creating an OBS Link4. Creating a Migration Job
Prerequisitesl You have subscribed to OBS and DLI.l You have created resource queues, databases, and tables on DLI.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 148
Creating a CDM ClusterLog in to the CDM management console and perform operations as required.l If you already have a CDM cluster, click Job Management in the row of the cluster and
create links on the page that is displayed.l If you do not have a CDM cluster, click Buy CDM to create a cluster. For details about
how to create a cluster, see Creating a Cluster.In this scenario, if the CDM cluster is used only to migrate data from OBS to DLI anddoes not need to migrate data of other data sources, there is no special requirements onthe VPC, subnet, and security group of the CDM cluster. You can specify them based onyour needs. CDM accesses DLI and OBS through the intranet. The flavor of the CDMcluster is selected based on the amount of data to be migrated. Generally, selectcdm.medium to meet the requirements of most migration scenarios.
Creating a DLI Link
Step 1 Click Job Management in the Operation column of the CDM cluster. On the page that isdisplayed, choose Link Management > Create Link. The page for selecting a connector isdisplayed. See Figure 6-30.
Figure 6-30 Selecting a connector
Step 2 Select Data Lake Insight, click Next, and configure the DLI link parameters. See Figure6-31.l Name: Enter a custom link name, for example, dlilink.l AK and SK: Enter the AK and SK used for accessing the DLI database. To obtain the
AK and SK, hover the cursor on the username on the management console and chooseMy Credential > Access Keys.
l Project ID: Enter the ID of the project to which DLI belongs. Obtain the project ID onthe My Credential page.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 149
Figure 6-31 Creating a DLI link
Step 3 Click Save. The Link Management page is displayed.
----End
Creating an OBS Link
Step 1 On the Link Management tab page, click Create Link. On the page that is displayed, selectHUAWEI CLOUD OBS, click Next, and configure the required link parameters. See Figure6-32.l Name: Enter a custom link name, for example, obslink.l OBS Server and Port: Enter the actual OBS address information.l AK and SK: Enter the AK and SK used for logging in to OBS.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 150
Figure 6-32 Creating an OBS link
Step 2 Click Save. The Link Management page is displayed.
----End
Creating a Migration Job
Step 1 Choose Table/File Migration > Create Job to create a job for migrating data from OBS toDLI. See Figure 6-33.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 151
Figure 6-33 Creating a job
l Job Name: Enter a custom job name.
l Source Link Name: Select the obslink link created in Creating an OBS Link.– Bucket Name: Select the bucket from which the data is to be migrated.– Source Directory/File: Set this parameter to the path of the data to be migrated.– File Format: Select CSV or JSON for transferring files to a data table.– Retain the default values of the optional parameters in Show Advanced Attributes.
For details, see From OBS/OSS.
l Destination Link Name: Select the dlilink link created in Creating a DLI Link.– Resource Queue: Enter the resource queue to which the destination table belongs.– Database Name: Enter the name of the database to which data is to be written.– Table Name: Enter the name of the table to which data is to be written. CDM
cannot automatically create tables on DLI. The table must be created on DLI inadvance, and the field types and formats of the table must be consistent with thoseof the data to be migrated.
– Clear Before Importing Data: Choose whether to clear data in the destinationtable before data import. In this example, retain the default value.
Step 2 Click Next. The Map Field page is displayed. CDM automatically matches the source anddestination fields.
l If the field mapping is incorrect, you can drag the fields to adjust the mapping.
l CDM supports field conversion during the migration. For details, see Field ConversionDuring Migration.
Step 3 Click Next to set task parameters. Generally, retain the default values of all parameters.
In this step, you can configure the following optional functions:
l Retry upon Failure: If the job fails to be executed, you can determine whether toautomatically retry. Retain the default value Never.
l Schedule Execution: To configure scheduled jobs, see Scheduling Job Execution.Retain the default value No.
l Concurrent Extractors: Enter the number of extractors to be concurrently executed.Retain the default value 1.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 152
l Write Dirty Data: Specify this parameter if data that fails to be processed or filtered outduring job execution needs to be written to OBS for future viewing. Before writing dirtydata, create an OBS link. Retain the default value No so that dirty data is not recorded.
l Delete Job After Completion: Retain the default value Do not delete.
Step 4 Click Save and Run. The Job Management page is displayed, on which you can view thejob execution progress and result.
Step 5 After the job is successfully executed, in the Operation column of the job, click HistoricalRecord to view the job's historical execution records and read/write statistics.
On the Historical Record page, click Log to view the job log.
----End
6.8 Migrating Data from the MySQL Database to the MRSHive Partition Table
MRS provides enterprise-level big data clusters on the cloud. It contains HDFS, Hive, andSpark components and is applicable to massive data analysis of enterprises.
Hive supports SQL to help users perform extraction, transformation, and loading (ETL)operations on large-scale data sets. Query on large-scale data sets takes a long time. In manyscenarios, you can create Hive partitions to reduce the total amount of data to be scanned eachtime. This significantly improves query performance.
Hive partitions are implemented by using the HDFS subdirectory function. Each subdirectorycontains the column names and values of each partition. If there are multiple partitions, manyHDFS subdirectories exist. It is not easy to load external data to each partition of the Hivetable without relying on tools. With CDM, you can easily load data of the external datasources (relational databases, object storage services, and file system services) to Hivepartition tables.
This section describes how to migrate data from the MySQL database to the MRS Hivepartition table.
ScenarioSuppose that there is a trip_data table in the MySQL database. The table stores cyclingrecords such as the start time, end time, start sites, end sites, and rider IDs. For details aboutthe fields in the trip_data table, see Figure 6-34.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 153
Figure 6-34 MySQL table fields
The following describes how to use CDM to import the trip_data table in the MySQLdatabase to the MRS Hive partition table. The procedure is as follows:
1. Creating a Hive Partition Table on MRS Hive
2. Creating a CDM Cluster and Binding an EIP to the Cluster
3. Creating a MySQL Link
4. Creating a Hive Link
5. Creating a Migration Job
Prerequisitesl You have subscribed to MRS.
l You have sufficient EIP quota.
l You have obtained the IP address, port number, database name, username, and passwordfor connecting to the MySQL database. In addition, the user must have the read andwrite permissions on the MySQL database.
Creating a Hive Partition Table on MRS Hive
On MRS Hive, run the following SQL statement to create a Hive partition table namedtrip_data with three new fields y, ym, and ymd used as partition fields. The SQL statementis as follows:create table trip_data(TripID int,Duration int,StartDate timestamp,StartStation varchar(64),StartTerminal int,EndDate timestamp,EndStation varchar(64),EndTerminal int,Bike int,SubscriberType varchar(32),ZipCodev varchar(10))partitioned by (y int,ym int,ymd int);
NOTE
The trip_data partition table has three partition fields: year, year and month, and year, month, and dateof the start time of a ride. For example, if the start time of a ride is 2018/5/11 9:40, the record is saved inthe trip_data/2018/201805/20180511 partition. When the records in the trip_data table aresummarized, only part of the data needs to be scanned, greatly improving the performance.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 154
Creating a CDM Cluster and Binding an EIP to the Cluster
Step 1 Log in to the CDM management console and create a CDM cluster. For details about how tocreate a cluster, see Creating a Cluster. The key configurations are as follows:l The flavor of the CDM cluster is selected based on the amount of data to be migrated.
Generally, select cdm.medium to meet the requirements of most migration scenarios.l The CDM and MRS clusters must be in the same VPC, subnet, and security group.
Step 2 After the CDM cluster is created, on the Cluster Management page, click Bind Elastic IP inthe Operation column to bind an EIP to the cluster. The CDM cluster uses the EIP to accessMySQL.
Figure 6-35 Binding an EIP
----End
Creating a MySQL Link
Step 1 On the Cluster Management page, click Job Management of the cluster and choose LinkManagement > Create Link to enter the page for selecting the connector. See Figure 6-36.
Figure 6-36 Selecting a connector
Step 2 Select MySQL and click Next. On the page that is displayed, configure MySQL linkparameters, as shown in Figure 6-37.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 155
Figure 6-37 Creating a MySQL link
Click Show Advanced Attributes to display optional parameters. For details, see Link toRelational Databases. Retain the default values of the optional parameters and configure themandatory parameters according to Table 6-3.
Table 6-3 MySQL link parameters
Parameter Description Example Value
Name Unique link name mysqllink
Database Server IP address or domain name of theMySQL database server
192.168.0.1
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 156
Parameter Description Example Value
Port MySQL database port 3306
Database Name Name of the MySQL database sqoop
Username User who has the read, write, and deletepermissions on the MySQL database
admin
Password Password of the user -
Step 3 Click Save. The Link Management page is displayed.
NOTE
If an error occurs during the saving, the security settings of the MySQL database are incorrect. In thiscase, you need to enable the EIP of the CDM cluster to access the MySQL database.
----End
Creating a Hive Link
Step 1 Click Create Link and select MRS Hive to create an MRS Hive link.
Step 2 Click Next and configure the MRS Hive link parameters. See Figure 6-38.
Figure 6-38 Creating a Hive link
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 157
Table 6-4 describes the parameters. You can configure the parameters according to the actualsituation.
Table 6-4 Hive link parameters
Parameter Description Example Value
Name Link name, which can be defined basedon the data source type for easymemorization
hivelink
Manager IP IP address of MRS Manager. ClickSelect next to the Manager IP text boxto select a created MRS cluster. CDMautomatically fills in the authenticationinformation.
127.0.0.1
AuthenticationMethod
Authentication method used foraccessing MRSl Simple: Select this if MRS is in
non-security mode.l Kerberos: Select this if MRS is in
security mode.
Simple
Username When Authentication Method is setto Kerberos, set the username andpassword for logging in to MRSManager.
cdm
Password Password for logging in to MRSManager
-
Step 3 Click Save. The Link Management page is displayed.
----End
Creating a Migration Job
Step 1 Choose Table/File Migration > Create Job to create a data migration job. Figure 6-39illustrates how to create a migration job.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 158
Figure 6-39 Creating a migration job
NOTE
Set Clear Data Before Import to Yes, so that the data in the Hive table will be cleared before dataimport.
Step 2 After the parameters are configured, click Next. The Map Field tab page is displayed. SeeFigure 6-40.
Map the fields of the MySQL table and Hive table. The Hive table has three more fields y,ym, and ymd than the MySQL table, which are the Hive partition fields. Because the fields ofthe source table cannot be directly mapped to the destination table, you need to configure anexpression to extract data from the StartDate field in the source table.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 159
Figure 6-40 Field mapping
Step 3 Click on the left of y, ym, and ymd to display the Converter List dialog box, and thenchoose Create Converter > Expression conversion. See Figure 6-41.
The expressions for the y, ym, and ymd fields are as follows:
DateUtils.format(DateUtils.parseDate(row[2],"yyyy-MM-dd HH:mm:ss.SSS"),"yyyy")
DateUtils.format(DateUtils.parseDate(row[2],"yyyy-MM-ddHH:mm:ss.SSS"),"yyyyMM")
DateUtils.format(DateUtils.parseDate(row[2],"yyyy-MM-ddHH:mm:ss.SSS"),"yyyyMMdd")
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 160
Figure 6-41 Configuring the expression
NOTE
The expressions in CDM support field conversion of common character strings, dates, and values. Fordetails, see Field Conversion During Migration.
Step 4 Click Next to set task parameters. Generally, retain the default values of all parameters.
In this step, you can configure the following optional functions:l Retry upon Failure: If the job fails to be executed, you can determine whether to
automatically retry. Retain the default value Never.l Schedule Execution: To configure scheduled jobs, see Scheduling Job Execution.
Retain the default value No.l Concurrent Extractors: Enter the number of extractors to be concurrently executed.
Retain the default value 1.l Write Dirty Data: Specify this parameter if data that fails to be processed or filtered out
during job execution needs to be written to OBS for future viewing. Before writing dirtydata, create an OBS link. Retain the default value No so that dirty data is not recorded.
l Delete Job After Completion: Retain the default value Do not delete.
Step 5 Click Save and Run. The Job Management page is displayed, on which you can view thejob execution progress and result.
Step 6 After the job is successfully executed, in the Operation column of the job, click HistoricalRecord to view the job's historical execution records and read/write statistics.
On the Historical Record page, click Log to view the job log.
----End
6.9 Migrating Data from the MySQL Database to DDMDDM removes the capacity and performance bottlenecks of databases and solves distributedexpansion issues. DDM supports sharding, read/write isolation, and elastic scaling, enablinghigh concurrent access to mass data and improving database read/write performance.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 161
This section describes how to use CDM to migrate a table from the on-premises MySQLdatabase to DDM and store data in a distributed manner.
Scenario1. Suppose that there is a trip table in the sqoop MySQL database. The table stores cycling
records such as the start time, end time, start sites, end sites, and rider IDs. For detailsabout the fields in the trip table, see Table 6-5.
Table 6-5 Fields in the trip table
Field Type
tripid int
duration int
startdate timestamp
startstation varchar(64)
startterminal int
enddata timestamp
endstation varchar(64)
endterminal int
bike int
subscriberType varchar(32)
zipcode varchar(10)
2. You have created a DDM instance and created a schema. For details about the
operations, see Getting Started in the Distributed Database Middleware User Guide.For example, Figure 6-42 shows the DDM instance purchased here. The name of theschema created on DDM is db_cdm.
NOTE
DDM supports multiple instances of different specifications. The parallel computing capabilityimproves as the number of cores increases. The larger the memory is, the more complicated thedata can be queried and processed in batches.
You can select proper specifications based on the service plan to reduce the cost of using DDM.For details, see Selecting Proper DDM Specifications.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 162
Figure 6-42 Basic information about the DDM instance
The following describes how to use CDM to migrate the trip table from the sqoop database tothe db_cdm schema of DDM. The procedure is as follows:
1. Creating a Sharded Table in the DDM Schema2. Creating a CDM Cluster and Binding an EIP to the Cluster3. Creating a MySQL Link4. Creating a DDM Link5. Creating a Migration Job
Prerequisitesl You have sufficient EIP quota. The on-premises MySQL database can be accessed
through the EIP address.l You have obtained the IP address, port number, username, and password for connecting
to the sqoop database. In addition, the user must have the read and write permissions onthe database.
l You have obtained the username and password of the db_cdm schema. In addition, theuser must have the read and write permissions on the schema.
l You have associated the DDM instance with an RDS instance in the same VPC. Fordetails about the operation, see the Distributed Database Middleware User Guide.
Creating a Sharded Table in the DDM Schema
Create a sharded table named trip_ddm in the db_cdm schema of DDM. The field name andfield type are the same as those in the trip table of the on-premises MySQL database.Configure Table Type, Sharding Rule, Sharding Key, and SQL Statement Used for TableCreation. The following offers the SQL statement to create a table. For more information, seeCreating a Logical Table.create table trip_ddm( tripid int, duration int, startdate timestamp, startstation varchar(64), startterminal int, enddata timestamp,
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 163
endstation varchar(64), endterminal int, bike int, subscriberType varchar(32), zipcode varchar(10) )
Creating a CDM Cluster and Binding an EIP to the Cluster
Step 1 Log in to the CDM management console and click Buy CDM. On the Buy CDM Clusterpage that is displayed, configure the required parameters. Table 6-6 describes the parameters.
Table 6-6 Parameter description
Parameter Example Value Description
CurrentRegion
CN North-Beijng1
The region must be the same as the region where theDDM instance is located.
AZ AZ1
Cluster Name cdm131 Enter a custom CDM cluster name.
Version 1.3.0 CDM version. Retain the default value.
Instance Type cdm.medium Currently, the following flavors are available:l cdm.small: 2 vCPUs with 4 GB memory,
applicable to Proof of Concept (PoC) verificationand development tests
l cdm.medium: 4 vCPUs with 8 GB memory,applicable to migration of a single database tablewith fewer than 10 million pieces of data
l cdm.large: 8 vCPUs with 16 GB memory,applicable to migration of a single database tablewith 10 million pieces of data or more
l cdm.xlarge: 16 vCPUs with 32 GB memory,applicable to TB-level data migration requiring10GE high-speed bandwidth
Select cdm.medium, which is applicable to mostmigration scenarios.
VPC myvpc The CDM cluster and the DDM instance must be inthe same VPC, subnet, and security group. For detailsabout the DDM instance network information, seeFigure 6-42.
The CDM cluster accesses the DDM instance throughthe intranet.
Subnet subnet-168-1(192.168.1.0/24)
SecurityGroup
Sys-default
AutoShutdown
No Retain default values of the parameters.
ScheduledStartup
No
ScheduledShutdown
No
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 164
Step 2 After the CDM cluster is created, bind an EIP to the cluster on the Cluster Managementpage. The CDM cluster uses the EIP to access the on-premises MySQL database. See Figure6-43.
Figure 6-43 Binding an EIP
----End
Creating a MySQL Link
Step 1 On the Cluster Management page, click Job Management of the cluster and choose LinkManagement > Create Link to enter the page for selecting the connector. See Figure 6-44.
Figure 6-44 Selecting a connector
Step 2 Select MySQL and click Next. On the page that is displayed, configure MySQL linkparameters, as shown in Figure 6-45.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 165
Figure 6-45 Creating a MySQL link
Click Show Advanced Attributes to display optional parameters. For details, see Link toRelational Databases. Retain the default values of the optional parameters and configure themandatory parameters according to Table 6-7.
Table 6-7 MySQL link parameters
Parameter Description Example Value
Name Unique link name mysqllink
Database Server IP address or domain name of theMySQL database server
192.168.0.1
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 166
Parameter Description Example Value
Port MySQL database port 3306
Database Name Name of the MySQL database sqoop
Username User who has the read, write, and deletepermissions on the MySQL database
admin
Password Password of the user -
Step 3 Click Save. The Link Management page is displayed.
NOTE
If an error occurs during the saving, the security settings of the MySQL database are incorrect. In thiscase, you need to enable the EIP of the CDM cluster to access the MySQL database.
----End
Creating a DDM Link
Step 1 On the Link Management tab page, click Create Link and select DDM in RelationalDatabase.
Step 2 Click Next and configure the DDM link parameters. See Figure 6-46.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 167
Figure 6-46 Creating a DDM link
NOTE
l Database Server and Port: Enter one of the access addresses of the DDM instance. For detailsabout the access addresses of the DDM instance, see Figure 6-42.
l Database Name: Enter the name of the schema of the DDM instance, for example, db_cdm.
l Username and Password: Enter the username and password used for logging in to DDM. The usermust have the permission to read and write the db_cdm schema.
Step 3 Click Save. The Link Management page is displayed.
----End
Creating a Migration Job
Step 1 Choose Table/File Migration > Create Job to create a data migration job. Figure 6-47illustrates how to create a migration job.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 168
Figure 6-47 Creating a migration job
NOTE
l Job Name: Enter a custom job name.
l Source Job Configuration:
– Source Link Name: Select the mysqllink link created in Creating a MySQL Link.
– Schema/Tablespace: Select the sqoop database where the trip table is located.
– Table Name: Select the trip table.
l Destination Job Configuration:
– Source Link Name: Select the ddmlink link created in Creating a DDM Link.
– Schema/Tablespace: Select the db_cdm schema of the DDM instance.
– Table Name: Select the trip_ddm logical table of the DDM instance, that is, the sharded tablecreated in Creating a Sharded Table in the DDM Schema.
– Clear Data Before Import: Retain the default value.
Step 2 Click Next. The Map Field page is displayed. See Figure 6-48.
The fields in the trip table in the sqoop database are the same as those in the trip_ddm tablein DDM. CDM automatically maps the fields with the same name. You only need to checkwhether the field mapping and time format are correct.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 169
Figure 6-48 Field mapping
NOTE
l If the field mapping is incorrect, click the row where the field is located and drag the field to adjustthe mapping.
l If you need to convert the content of the source fields, perform the operations described in FieldConversion During Migration. In this example, the field conversion is not required.
Step 3 Click Next to set task parameters. Generally, retain the default values of all parameters.
In this step, you can configure the following optional functions:l Retry upon Failure: If the job fails to be executed, you can determine whether to
automatically retry. Retain the default value Never.l Schedule Execution: To configure scheduled jobs, see Scheduling Job Execution.
Retain the default value No.l Concurrent Extractors: Enter the number of extractors to be concurrently executed.
Retain the default value 1.l Write Dirty Data: Specify this parameter if data that fails to be processed or filtered out
during job execution needs to be written to OBS for future viewing. Before writing dirtydata, create an OBS link. Retain the default value No so that dirty data is not recorded.
l Delete Job After Completion: Retain the default value Do not delete.
Step 4 Click Save and Run. The Job Management page is displayed, on which you can view thejob execution progress and result.
Figure 6-49 Job execution result
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 170
Step 5 After the job is successfully executed, in the Operation column of the job, click HistoricalRecord to view the job's historical execution records, read/write statistics, and job log.
Figure 6-50 Querying migration job records
----End
6.10 Migrating the Entire MySQL Database to RDS
Scenario
This section describes how to migrate the entire on-premises MySQL database to RDS onHUAWEI CLOUD using the CDM's entire DB migration function.
Currently, CDM can migrate the entire on-premises MySQL database to RDS for MySQL,RDS for PostgreSQL, or RDS for SQL Server. The following describes how to migrate theentire database to RDS for MySQL. The procedure is as follows:
1. Creating a CDM Cluster and Binding an EIP to the Cluster2. Creating a MySQL Link3. Creating an RDS Link4. Creating an Entire Database Migration Job
Prerequisitesl You have sufficient EIP quota.l You have subscribed to an RDS database instance and the database engine of this
instance is MySQL.l The on-premises MySQL database can be accessed through the public network. If the
MySQL server is deployed on a local data center or a third-party cloud, ensure that an IPaddress that can be accessed from the public network has been configured for theMySQL database, or a VPN channel or Direct Connect from the internal data center toHUAWEI CLOUD has been established. To enable public network access, see How Do IConnect On-premises Intranet or Third-Party Private Network to CDM.
l You have obtained the IP addresses, names, usernames, and passwords of the on-premises MySQL database and RDS for MySQL.
Creating a CDM Cluster and Binding an EIP to the Cluster
Step 1 Log in to the CDM management console and create a CDM cluster. For details about how tocreate a cluster, see Creating a Cluster. The key configurations are as follows:l The flavor of the CDM cluster is selected based on the amount of data to be migrated.
Generally, select cdm.medium to meet the requirements of most migration scenarios.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 171
l The CDM cluster and the RDS for MySQL instance must be in the same VPC. Inaddition, it is recommended that the CDM cluster be in the same subnet and securitygroup as the RDS for MySQL instance.
l If the same subnet and security group cannot be used for security purposes, ensure that asecurity group rule has been configured to allow the CDM cluster to access the RDS forMySQL instance.
Step 2 After the CDM cluster is created, on the Cluster Management page, click Bind Elastic IP inthe Operation column to bind an EIP to the cluster. The CDM cluster uses the EIP to accessthe on-premises MySQL database.
Figure 6-51 Binding an EIP
----End
Creating a MySQL Link
Step 1 On the Cluster Management page, click Job Management of the cluster and choose LinkManagement > Create Link to enter the page for selecting the connector. See Figure 6-52.
Figure 6-52 Selecting a connector
Step 2 Select MySQL and click Next. On the page that is displayed, configure MySQL linkparameters, as shown in Figure 6-53.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 172
Figure 6-53 Creating a MySQL link
Click Show Advanced Attributes to display optional parameters. For details, see Link toRelational Databases. Retain the default values of the optional parameters and configure themandatory parameters according to Table 6-8.
Table 6-8 MySQL link parameters
Parameter Description Example Value
Name Unique link name mysqllink
Database Server IP address or domain name of theMySQL database server
192.168.0.1
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 173
Parameter Description Example Value
Port MySQL database port 3306
Database Name Name of the MySQL database sqoop
Username User who has the read, write, and deletepermissions on the MySQL database
admin
Password Password of the user -
Step 3 Click Save. The Link Management page is displayed.
NOTE
If an error occurs during the saving, the security settings of the MySQL database are incorrect. In thiscase, you need to enable the EIP of the CDM cluster to access the MySQL database.
----End
Creating an RDS Link
Step 1 Select RDS (MySQL), click Next, and configure the RDS link parameters.l Name: Enter a custom link name, for example, rds_link.l Database Server and Port: Enter the address information about the RDS for MySQL
database.l Database Name: Enter the name of the RDS for MySQL database.l Username and Password: Enter the username and password used for logging in to the
database.
NOTE
l During RDS link creation, if Use Local API in Show Advanced Attributes is set to Yes, you canuse the LOAD DATA function provided by MySQL to speed up data import.
l The LOAD DATA function is disabled by default on RDS for MySQL, so you need to modify theparameter group of the MySQL instance and set local_infile to ON to enable this function.
l If the local_infile parameter group cannot be edited, it is the default parameter group. You need tocreate a parameter group and modify its value, and apply it to the MySQL instance of RDS.
Step 2 Click Save. The Link Management page is displayed.
----End
Creating an Entire Database Migration Job
Step 1 After the two links are created, choose Entire DB Migration > Create Job to create amigration job. See Figure 6-54.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 174
Figure 6-54 Creating an entire database migration job
l Job Name: Enter a name for the entire database migration job.
l Source Job Configuration– Source Link Name: Select the mysql_link link created in Creating a MySQL
Link.– Schema/Tablespace: Select the on-premises MySQL database from which data is
to be exported.
l Destination Job Configuration– Destination Link Name: Select the rds_link link created in Creating an RDS
Link.– Schema/Tablespace: Select the name of the RDS database to which data is to be
imported.– Auto Table Creation: Select Auto creation, which indicates that CDM
automatically creates tables in the RDS database when tables of the on-premisesMySQL database do not exist in the RDS database.
– Clear Data Before Import: Select Yes, which indicates that when a table with thesame name as the table in the on-premises MySQL database exists in the RDSdatabase, CDM clears data in the table on RDS.
– Retain the default values of the optional parameters in Show Advanced Attributes.
Step 2 Click Next and select the tables to be migrated. After selecting desired tables, click or
to move them to the right pane.
Step 3 Click Save and Run and CDM immediately starts the entire database migration job.
When the job starts running, a sub-job will be generated for each table. You can click the jobname to view the sub-job list.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 175
Step 4 In the Operation column of the job, click Historical Record to view the job's historicalexecution records and read/write statistics.
There is no log for the entire database migration job. However, the sub-jobs have logs. On theHistorical Record page of the sub-jobs, click Log to view the job log.
----End
6.11 Migrating the Entire Elasticsearch Database to CloudSearch Service
Scenario
Cloud Search Service provides users with structured and unstructured data search, statistics,and report capabilities. This section describes how to use CDM to migrate the entireElasticsearch database to Cloud Search Service. The procedure is as follows:
1. Creating a CDM Cluster and Binding an EIP to the Cluster2. Creating a Cloud Search Service Link3. Creating an Elasticsearch Link4. Creating an Entire Database Migration Job
Prerequisitesl You have sufficient EIP quota.
l You have subscribed to Cloud Search Service and obtained the IP address and portnumber of the Cloud Search Service cluster.
l You have obtained the IP address, port number, username, and password of the on-premises Elasticsearch database server.If the Elasticsearch server is deployed on an on-premises data center or a third-partycloud, ensure that an IP address that can be accessed from the public network has beenconfigured for the Elasticsearch database, or the VPN or Direct Connect between the on-premises data center and HUAWEI CLOUD has been established. To enable publicnetwork access, see How Do I Connect On-premises Intranet or Third-Party PrivateNetwork to CDM.
Creating a CDM Cluster and Binding an EIP to the Cluster
Step 1 Log in to the CDM management console and create a CDM cluster. For details about how tocreate a cluster, see Creating a Cluster. The key configurations are as follows:
l The flavor of the CDM cluster is selected based on the amount of data to be migrated.Generally, select cdm.medium to meet the requirements of most migration scenarios.
l The CDM and Cloud Search Service clusters must be in the same VPC. In addition, it isrecommended that the CDM cluster be in the same subnet and security group as theCloud Search Service cluster.
l If the same subnet and security group cannot be used for security purposes, ensure that asecurity group rule has been configured to allow the CDM cluster to access the CloudSearch Service cluster.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 176
Step 2 After the CDM cluster is created, on the Cluster Management page, click Bind Elastic IP inthe Operation column to bind an EIP to the cluster. The CDM cluster uses the EIP to accessthe on-premises Elasticsearch.
----End
Creating a Cloud Search Service Link
Step 1 Click Job Management in the Operation column of the CDM cluster. On the page that isdisplayed, choose Link Management > Create Link. The page for selecting a connector isdisplayed. See Figure 6-55.
Figure 6-55 Selecting a connector
Step 2 Select Cloud Search Service and click Next. On the page that is displayed, configure theCloud Search Service link parameters. See Figure 6-56.l Name: Enter a custom link name, for example, csslink.l Elasticsearch Server and Port: Enter the address and port number of the Cloud Search
Service cluster.l Username and Password: Enter the username and password used for logging in to the
Cloud Search Service cluster. The user must have the read and write permissions on thedatabase.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 177
Figure 6-56 Creating a Cloud Search Service link
Step 3 Click Save. The Link Management page is displayed.
----End
Creating an Elasticsearch Link
Step 1 On the Link Management tab page, click Create Link. On the page that is displayed, selectElasticsearch, click Next, and configure the Elasticsearch link parameters. The Elasticsearchlink parameters are the same as those of the Cloud Search Service link.l Name: Enter a custom link name, for example, es_link.l Elasticsearch Server and Port: Enter the IP address and port number of the on-premises
Elasticsearch database.l Username and Password: If the Elasticsearch database has user restrictions, select the
user who has the read and write permissions on the Elasticsearch database. If there is norestriction, you do not need to set the parameters.
Step 2 Click Save. The Link Management page is displayed.
----End
Creating an Entire Database Migration Job
Step 1 Choose Entire DB Migration > Create Job to create an entire database migration job.
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 178
Figure 6-57 Creating an entire database migration job
l Job Name: Enter a unique name.l Source Job Configuration
– Source Link Name: Select the es_link link created in Creating an ElasticsearchLink.
– Index: Click the icon next to the text box to select an index in the on-premisesElasticsearch database or manually enter an index name. The name can contain onlylowercase letters.
l Destination Job Configuration– Destination Link Name: Select the csslink link created in Creating a Cloud
Search Service Link.– Index: Enter the index of the data to be written. You can select an existing index in
Cloud Search Service or manually enter an index name that does not exist. Thename can contain only lowercase letters. CDM automatically creates the index inCloud Search Service.
– Clear Data Before Import: If the selected index already exists in Cloud SearchService, you can choose whether to clear the data in the index before importingdata. If you set this parameter to No, the data is appended to the index.
Step 2 Click Save and Run. The Job Management page is displayed, on which you can view thejob execution progress and result.
A sub-job will be generated for each type in the on-premises Elasticsearch index forconcurrent execution. You can click the job name to view the sub-job progress.
Step 3 After the job is successfully executed, in the Operation column of the job, click HistoricalRecord to view the job's historical execution records, read/write statistics, and job log (onlythe sub-jobs have job logs).
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 179
Figure 6-58 Historical Record
----End
Cloud Data MigrationUser Guide 6 Typical Scenarios
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 180
7 Advanced Operations
This section describes how to configure CDM in advanced scenarios and how to use advancedCDM parameters. It is applicable to users who are familiar with basic CDM functions.
CDM supports the following advanced scenarios:l Incremental File Migrationl Incremental Migration of Relational Databasesl HBase/CloudTable Incremental Migrationl Incremental Synchronization Using the Macro Variables of Date and Timel Migration in Transaction Model Data Encryption During the Migration to OBSl MD5 Verification for Files in Migrationl Field Conversion During Migrationl Migration of a List of Filesl Using Regular Expressions to Separate Semi-structured Textl GDS Import Model File Formats
7.1 Incremental File MigrationCDM supports incremental migration of file systems. After full migration is complete, all newfiles or only specified directories or files can be exported.
l Exporting all new files– Application scenarios: Both the migration source and destination are file systems
(OBS/OSS/HDFS/FTP/SFTP/NAS).– Key configurations: Duplicate File Processing Method and Configuring
Scheduled Jobs– Prerequisites: None
l Exporting files in a specified directory– Application scenarios: The source end is a file system (OBS/OSS/HDFS/FTP/
SFTP/NAS). The destination end can be of any type. In incremental migration, only
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 181
the specified files are written to the destination end. The existing records are notupdated or deleted.
– Key configurations: File/Path Filter and Configuring Scheduled Jobs– Prerequisites: The source directory or file name contains the time field.
Duplicate File Processing MethodWhen creating a table/file migration job, if the source and destination ends are file systems,the Duplicate File Processing Method parameter is available in Destination LinkConfiguration. You can select Replace, Skip, or Stop job. When a file with the same nameand size exists on the source and destination ends, CDM determines that the file is a duplicatefile.
CDM supports binary file transfer (without parsing files), which delivers the optimaltransmission rate. If the path from which data is to be exported is a directory, CDM importsall files in the directory to the migration destination.
If files in the source directory are added irregularly, the key configurations for job creation areas follows:
1. Set Duplicate File Processing Method of the destination link to Skip. See Figure 7-1.
Figure 7-1 Skipping duplicated files
2. Configure scheduled job execution.
In this way, you can import the newly added files to the destination directory periodically toimplement incremental synchronization.
File/Path FilterWhen creating a table/file migration job, if the source end is a file system, the Filter Typeparameter is available in Source Link Configuration. You can select either Wildcard orRegex. During incremental file migration, Wildcard is selected. In this way, you canconfigure a wildcard to filter files or paths. CDM migrates only files or paths that meetspecified conditions.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 182
If the source file name contains the date and time field, such as 2017-10-15 20:25:26,the /opt/data/file_20171015202526.data file is generated. The key configurations for jobcreation are as follows:
1. In source link parameters, set Filter Type to Wildcard. See Figure 7-2.
Figure 7-2 Filtering files
2. Enter *${dateformat(yyyyMMdd,-1,DAY)}* in File Filter. *${dateformat(yyyyMMdd,-1,DAY)}* is the macro variable format of date and timesupported by CDM. For details, see Incremental Synchronization Using the MacroVariables of Date and Time.
3. Select Schedule Execution and set Cycle to one day.
In this way, you can import the files generated in the previous day to the destination directoryevery day to implement incremental synchronization.
In incremental file migration, Path Filter is used in the same way as File Filter. The pathname must contain the time field. In this case, all files in the specified directory can besynchronized periodically.
7.2 Incremental Migration of Relational DatabasesCDM supports incremental migration of relational databases. After full migration is complete,data whose field value is greater than the specified field value or within a specified period oftime can be incrementally migrated. For example, only data whose date value is greater than2017-10-16 19:00:00 is exported each time when a job is started, or data generated on theprevious day is exported at 00:00:00 every day.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 183
l Migrating incremental data whose field value is greater than the specified fieldvalue– Application scenarios: Both the migration source and destination are relational
databases.– Key configurations: Incremental Migration Using the Regain Symbol and
Schedule Execution– Prerequisites: The data table contains a numeric field or timestamp field that is
unique and automatically increases.l Migrating incremental data within a specified period of time
– Application scenarios: The source end is a relational database. For details, seeFrom a Relational Database. The destination end can be of any type.
– Key configurations: Where Clause and Schedule Execution– Prerequisites: The data table contains a date and time field or timestamp field.
In incremental migration, only the specified data is written to the data table. The existingrecords are not updated or deleted.
Incremental Migration Using the Regain SymbolWhen creating a table/file migration job, if both the migration source and destination arerelational databases, the Regain Symbol parameter is available in the advanced attributes ofSource Link Configuration.
After Regain Symbol is set to a specified field, CDM queries the table imported to thedestination database every time a scheduled task is started. If the table does not contain thespecified field, CDM performs full migration. If the table contains the specified field and thefield has a value, CDM performs incremental migration to migrate only the data whose valueis greater than the value of this field.
The specified field must have a unique value that automatically increases, for example,auto_increment int, timestamp, or date.
This parameter is used together with the scheduled jobs of CDM configured according toScheduling Job Execution, so that jobs are scheduled to implement incrementalsynchronization of relational databases.
For example, the date field in the data table records the date and time when each data recordis created. When a migration job is created, this field is specified as Regain Symbol. SeeFigure 7-3. If a scheduled job is configured to automatically run every three hours, fullmigration is performed when the job is run for the first time. Incremental synchronization isperformed when the job is run for the second time, and only data created in the last threehours is exported. Subsequently, the data is automatically synchronized every three hours.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 184
Figure 7-3 Regain Symbol
Incremental Migration Using the Where Clause with the Time VariableWhen creating a table/file migration job, if the source end is a relational database, theWhere Clause parameter is available in the advanced attributes of Source LinkConfiguration.
Set Where Clause to an SQL statement, for example, age > 18 and age <= 60), CDMexports only the data that meets the SQL statement requirement. If Where Clause is notspecified, the entire table is exported.
Where Clause can be set to macro variables of date and time. When the data table containsthe date or timestamp field, Where Clause and Schedule Execution can be used together toextract data of a specified date.
For example, the database table contains column DS that indicates the time, the value type ofthe column is varchar(30), and the inserted time format is similar to 2017-xx-xx.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 185
Figure 7-4 Table data
Set Where Clause to DS='${dateformat(yyyy-MM-dd,-1,DAY)}', as shown in Figure 7-5.If the scheduled job automatically executes at 00:00:00 every day, all data created on theprevious day can be exported at 00:00:00 every day.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 186
Figure 7-5 Where Clause
Where Clause can be configured to various macro variables of date and time. You can usethe macro variables of date and time and scheduled jobs with specified cycle of minutes,hours, days, weeks, or months together to automatically export data at a specific time.
7.3 HBase/CloudTable Incremental MigrationYou can use CDM to export data in a specified period of time from HBase (including MRSHBase, FusionInsight HBase, and Apache HBase) and CloudTable. The CDM scheduled jobscan be used together to implement incremental migration of HBase and CloudTable.
When creating a table/file migration job and selecting Link to HBase or Link to CloudTableas the migration source, you can set the time range in advanced attributes. Figure 7-6 showsthe configuration items in advanced attributes.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 187
Figure 7-6 Time range
l Start time (including the value) for extracting data. The format is yyyy-MM-ddHH:mm:ss. Only the data generated at the specified time and later is extracted.
l End time (excluding the value) for extracting data. The format is yyyy-MM-ddHH:mm:ss. Only the data generated before the time point is extracted.
The two parameters can be set to macro variables of date and time. Examples are asfollows:
l If Minimum Timestamp is set to ${dateformat(yyyy-MM-dd HH:mm:ss, -1, DAY)},only the data generated after the day before is exported.
l If Maximum Timestamp is set to ${dateformat(yyyy-MM-dd HH:mm:ss)}, only thedata generated before the specified time point is exported.
If both parameters are configured, only the data generated on the previous day is exported. Inaddition, if the job is configured to execute at 00:00:00 every day, the data generated everyday can be incrementally synchronized.
7.4 Incremental Synchronization Using the MacroVariables of Date and Time
During the creation of table/file migration jobs, CDM supports the macro variables of dateand time in the following parameters of the source and destination links:l Source directoryl Source table namel Write directoryl Destination table namel Where clause
You can use the ${} macro variable definition identifier to define the macros of the time type.currently, dateformat and timestamp are supported.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 188
By using the macro variables of date and time and scheduled job, you can implementincremental synchronization of databases and files.
dateformat
dateformat supports two types of parameters:
l dateformat(format)format indicates the date and time format. For details about the format definition, see thedefinition in java.text.SimpleDateFormat.java.For example, if the current date is 2017-10-16 09:00:00, yyyy-MM-dd HH:mm:ssindicates 2017-10-16 09:00:00.
l dateformat(format, dateOffset, dateType)– format indicates the format of the returned date.– dateOffset indicates the date offset.– dateType indicates the type of the date offset.
Currently, dateType supports SECOND, MINUTE, HOUR, and DAY.For example, if the current date is 2017-10-16 09:00:00, dateformat(yyyy-MM-ddHH:mm:ss, -1, DAY) indicates the day before the current day, that is, 2017-10-1509:00:00.
timestamp
timestamp supports two types of parameters:
l timestamp()Indicates the returned timestamp of the current time, that is, the number of millisecondsthat have elapsed since 00:00:00 on January 1, 1970 (1970-01-01 00:00:00 GMT). Forexample, 1508078516286.
l timestamp(dateOffset, dateType)Indicates the timestamp returned after time offset. dateOffset and dateType indicate thedate offset and the offset type, respectively.For example, if the current date is 2017-10-16 09:00:00, timestamp(-10, MINUTE)indicates that the timestamp generated 10 minutes before the current time point isreturned, that is, 1508115000000.
Macro Variable Definition of Time and Date
Suppose that the current time is 2017-10-16 09:00:00, then Table 7-1 describes the macrovariable definitions of time and date.
Table 7-1 Macro variable definition of time and date
Macro Variable Description Display Effect
${dateformat(yyyy-MM-dd)} Returns the current date in yyyy-MM-dd format.
2017-10-16
${dateformat(yyyy/MM/dd)} Returns the current date inyyyy/MM/dd format.
2017/10/16
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 189
Macro Variable Description Display Effect
${dateformat(yyyy_MM_ddHH:mm:ss)}
Returns the current time inyyyy_MM_dd HH:mm:ssformat.
2017_10_1609:00:00
${dateformat(yyyy-MM-ddHH:mm:ss, -1, DAY)}
Returns the current time in yyyy-MM-dd HH:mm:ss format. Thedate is one day before the currentday.
2017-10-1509:00:00
${timestamp()} Returns the timestamp of thecurrent time, that is, the numberof milliseconds that have elapsedsince 00:00:00 on January 1,1970.
1508115600000
${timestamp(-10, MINUTE)} Returns the timestamp generated10 minutes before the currenttime point.
1508115000000
${timestamp(dateformat(yyyyMMdd))}
Returns the timestamp of00:00:00 of the current day.
1508083200000
${timestamp(dateformat(yyyyMMdd,-1,DAY))}
Returns the timestamp of00:00:00 of the previous day.
1507996800000
${timestamp(dateformat(yyyyMMddHH))}
Returns the timestamp of thecurrent hour.
1508115600000
Time and Date Macro Variables of Paths and Table NamesFigure 7-7 shows an example, where:l Table Name under Source Link Configuration is set to CDM_/${dateformat(yyyy-
MM-dd)}.l Write Directory under Destination Link Configuration is set to /opt/ttxx/$
{timestamp()}.
After the macro definition conversion, this job indicates that data in tableSQOOP.CDM_20171016 in the Oracle database is migrated to the /opt/ttxx/1508115701746directory of the SFTP server.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 190
Figure 7-7 Setting Table Name and Write Directory to a time and date macro variable
Currently, a table name or path name can contain multiple macro variables. For example, /opt/ttxx/${dateformat(yyyy-MM-dd)}/${timestamp()} is converted to /opt/ttxx/2017-10-16/1508115701746.
Time and Date Macro Variables in the Where Clause
Figure 7-8 uses table SQOOP.CDM_20171016 as an example. The table contains columnDS, which indicates the time.
Figure 7-8 Table data
Suppose that the current date is 2017-10-16 and you want to export data generated the daybefore the current day (DS = 2017-10-15), then you can set the value of Where Clause toDS='${dateformat(yyyy-MM-dd,-1,DAY)}' when creating a job. In this way, you can exportall data that complies with the DS = 2017-10-15 condition.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 191
Implementing Incremental Synchronization by Configuring the Macro Variablesof Date and Time and Scheduled Jobs
Two simple application scenarios are as follows:
l The database table contains column DS that indicates the time, the value type of thecolumn is varchar(30), and the inserted time format is similar to 2017-xx-xx.In a scheduled job, the cycle is one day, and the scheduled job is executed at 00:00:00every day. Set the value of Where Clause to DS='${dateformat(yyyy-MM-dd,-1,DAY)}', and then data generated in the previous day will be exported at 00:00:00every day.
l The database table contains column time that indicates the time, the type is Number,and the inserted time format is timestamp.In a scheduled job, the cycle is one day, and the scheduled job is executed at 00:00:00every day. Set the value of Where Clause to time between '${timestamp(-1,DAY)}and ${timestamp()}', and then data generated in the previous day will be exported at00:00:00 every day.
Configuration principles of other application scenarios are the same.
7.5 Migration in Transaction ModeWhen a CDM job fails to be executed, CDM rolls back the data to the state before the jobstarts and automatically deletes data from the destination table.
When creating a table/file migration job, CDM allows you to select whether to enable thetransaction mode by configuring Import to Staging Table. If you set this parameter to Yes,CDM automatically creates a temporary table and imports the data to the temporary table.After the data is imported successfully, CDM migrates the data to the destination table intransaction mode of the database. If the import fails, the destination table is rolled back to thestate before the job starts. See Figure 7-9.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 192
Figure 7-9 Migration in transaction mode
NOTE
If you set Clear Data Before Import to Yes, CDM does not roll back the deleted data even intransaction mode.
7.6 Data Encryption During the Migration to OBSWhen migrating data to OBS using CDM, you can perform KMS encryption during table/filemigration and entire database migration. See Figure 7-10. The key must be created in DataEncryption Workshop (DEW). For details, see the Data Encryption Workshop User Guide.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 193
Figure 7-10 Enabling KMS encryption
After KMS encryption is enabled, objects to be uploaded will be encrypted and stored on theserver. When you download the encrypted objects, the encrypted data will be decrypted on theserver and displayed in plaintext to users.
NOTE
l If KMS encryption is enabled, MD5 verification cannot be used.
l After KMS encryption is performed, the encryption status of the objects on OBS cannot be changed.
l A key in use cannot be deleted. Otherwise, the object encrypted with this key cannot be downloaded.
7.7 MD5 Verification for Files in MigrationMD5 verification can be performed when CDM reads files from the SFTP/CIFS server andwrites the files to OBS in binary format. CDM checks the end-to-end file consistency andwrites the verification result to the OBS bucket. The bucket can be a bucket that does notstore migration files. See Figure 7-11.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 194
Figure 7-11 Enabling MD5 verification to verify file consistency
If Validate MD5 Value is set to Yes, CDM checks whether the MD5 value of the files to beread is the same as that of the xx.md5 file in the source directory when CDM reads files fromthe migration source. If the migration source does not have the xx.md5 file, the verificationwill not be performed. After a file is read and written to OBS, the HTTP Header provides theMD5 value for OBS for verification.
NOTE
If MD5 verification is used, KMS encryption cannot be used.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 195
7.8 Field Conversion During MigrationYou can create a field converter on the Map Field tab page when creating a table/filemigration job. See Figure 7-12.
Figure 7-12 Creating a field converter
NOTE
Field mapping is not involved when the binary format is used to migrate files to files.
CDM can convert fields during migration. Currently, the following field converters aresupported:
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 196
l Anonymizationl Triml Reverse Stringl Replace Stringl Expression Conversion
AnonymizationThis converter is used to hide key information about the character string. For example, if youwant to convert 12345678910 to 123****8910, set parameters according to Figure 7-13.l Set Reserve Start Length to 3.l Set Reserve End Length to 4.l Set Replace Character to *.
Figure 7-13 Anonymization
TrimThis converter is used to automatically delete the spaces before and after a string. Noparameters need to be configured.
Reverse StringThis converter is used to automatically reverse a string. For example, reverse ABC into CBA.No parameters need to be configured.
Replace StringThis converter is used to replace a character string. You need to configure the object to bereplaced and the new value.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 197
Expression ConversionThis converter uses the JSP expression language (EL) to convert the current field or a row ofdata. The JSP EL is used to create arithmetic and logical expressions. Within a JSP ELexpression, you can use integers, floating point numbers, strings, the built-in constants trueand false for boolean values, and null.
The expression supports the following environment variables:l value: indicates the current field value.l row: indicates the current row, which is an array type.
The expression supports the following tool classes:l StringUtils: string processing tool class. For details, see
org.apache.commons.lang.StringUtils of the Java SDK code.l DateUtils: date tool classl CommonUtils: common tool classl NumberUtils: string-to-value conversion classl HttpsUtils: network file read class
Application examples:
1. Set a string constant for the current field, for example, VIP.Expression: "VIP"
2. If the field is of the string type, convert all character strings into lowercase letters, forexample, convert aBC to abc.Expression: StringUtils.lowerCase(value)
3. Convert all character strings of the current field to uppercase letters.Expression: StringUtils.upperCase(value)
4. If the field value is a date string in yyyy-MM-dd format, extract the year from the fieldvalue, for example, extract 2017 from 2017-12-01.Expression: StringUtils.substringBefore(value,"-")
5. If the field value is of the numeric type, convert the value to a new value which is twotimes greater than the original value:Expression: value*2
6. Convert the field value true to Y and other field values to N.Expression: value == "true"? "Y": "N"
7. If the field value is of the string type and is left empty, convert it to Default. Otherwise,the field value will not be converted.Expression: empty value? "Default" : value
8. If the first and second fields are of the numeric type, convert the field to the sum of thefirst and second field values.Expression: row[0] + row[1]
9. If the field is of the date or timestamp type, return the current year after conversion. Thedata type is int.Expression: DateUtils.getYear(value)
10. If the field is a date and time string in yyyy-MM-dd format, convert it to the date type:Expression: DateUtils.format(value,"yyyy-MM-dd")
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 198
11. Convert date format 2018/01/05 15:15:05 to 2018-01-05 15:15:05:Expression: DateUtils.format(DateUtils.parseDate(value,"yyyy/MM/ddHH:mm:ss"),"yyyy-MM-dd HH:mm:ss")
12. Obtain a 36-bit universally unique identifier (UUID):Expression: CommonUtils.randomUUID()
13. If the field is of the string type, capitalize the first letter, for example, convert cat to Cat.Expression: StringUtils.capitalize(value)
14. If the field is of the string type, convert the first letter to a lowercase letter, for example,convert Cat to cat.Expression: StringUtils.uncapitalize(value)
15. If the field is of the string type, use a space to fill in the character string to the specifiedlength and center the character string. If the length of the character string is not shorterthan the specified length, do not convert the character string. For example, convert ab toab to meet the specified length 4.Expression: StringUtils.center(value, 4)
16. Delete a newline (including \n, \r, and \r\n) at the end of a character string. For example,convert abc\r\n\r\n to abc\r\n.Expression: StringUtils.chomp(value)
17. If the string contains the specified string, true is returned; otherwise, false is returned.For example, abc contains a so that true is returned.Expression: StringUtils.contains(value, "a")
18. If the string contains any character of the specified string, true is returned; otherwise,false is returned. For example, zzabyycdxx contains either z or a so that true is returned.Expression: StringUtils.containsAny("value", "za")
19. If the string does not contain any one of the specified characters, true is returned. If anyspecified character is contained, false is returned. For example, abz contains onecharacter of xyz so that false is returned.Expression: StringUtils.containsNone(value, "xyz")
20. If the string contains only the specified characters, true is returned. If any other characteris contained, false is returned. For example, abab contains only characters among abc sothat true is returned.Expression: StringUtils.containsOnly(value, "abc")
21. If the character string is empty or null, convert it to the specified character string.Otherwise, do not convert the character string. For example, convert the empty characterstring to null.Expression: StringUtils.defaultIfEmpty(value, null)
22. If the string ends with the specified suffix (case sensitive), true is returned; otherwise,false is returned. For example, if the suffix of abcdef is not null, false is returned.Expression: StringUtils.endsWith(value, null)
23. If the string is the same as the specified string (case sensitive), true is returned;otherwise, false is returned. For example, after strings abc and ABC are compared, falseis returned.Expression: StringUtils.equals(value, "ABC")
24. Obtain the first index of the specified character string in a character string. If no index isfound, -1 is returned. For example, the first index of ab in aabaabaa is 1.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 199
Expression: StringUtils.indexOf(value, "ab")25. Obtain the last index of the specified character string in a character string. If no index is
found, -1 is returned. For example, the last index of k in aFkyk is 4.Expression: StringUtils.lastIndexOf(value, "k")
26. Obtain the first index of the specified character string from the position specified in thecharacter string. If no index is found, -1 is returned. For example, the first index of bobtained after the index 3 of aabaabaa is 5.Expression: StringUtils.indexOf(value, "b", 3)
27. Obtain the first index of any specified character in a character string. If no index isfound, -1 is returned. For example, the first index of z or a in zzabyycdxx. is 0.Expression: StringUtils.indexOfAny(value, "za")
28. If the string contains any Unicode character, true is returned; otherwise, false isreturned. For example, ab2c contains only non-Unicode characters so that false isreturned.Expression: StringUtils.isAlpha(value)
29. If the string contains only Unicode characters and digits, true is returned; otherwise,false is returned. For example, ab2c contains only Unicode characters and digits, so thattrue is returned.Expression: StringUtils.isAlphanumeric(value)
30. If the string contains only Unicode characters, digits, and spaces, true is returned;otherwise, false is returned. For example, ab2c contains only Unicode characters anddigits so that true is returned.Expression: StringUtils.isAlphanumericSpace(value)
31. If the string contains only Unicode characters and spaces, true is returned; otherwise,false is returned. For example, ab2c contains Unicode characters and digits so that falseis returned.Expression: StringUtils.isAlphaSpace(value)
32. If the string contains only printable ASCII characters, true is returned; otherwise, falseis returned. For example, for !ab-c~, true is returned.Expression: StringUtils.isAsciiPrintable(value)
33. If the string is empty or null, true is returned; otherwise, false is returned.Expression: StringUtils.isEmpty(value)
34. If the string contains only Unicode digits, true is returned; otherwise, false is returned.Expression: StringUtils.isNumeric(value)
35. Obtain the leftmost characters of the specified length. For example, obtain the leftmosttwo characters ab from abc.Expression: StringUtils.left(value, 2)
36. Obtain the rightmost characters of the specified length. For example, obtain therightmost two characters bc from abc.Expression: StringUtils.right(value, 2)
37. Concatenate the specified character string to the left of the current character string andspecify the length of the concatenated character string. If the length of the currentcharacter string is not shorter than the specified length, the character string will not beconverted. For example, if yz is concatenated to the left of bat and the length must be 8after concatenation, the character string is yzyzybat after conversion.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 200
Expression: StringUtils.leftPad(value, 8, "yz")38. Concatenate the specified character string to the right of the current character string and
specify the length of the concatenated character string. If the length of the currentcharacter string is not shorter than the specified length, the character string will not beconverted. For example, if yz is concatenated to the right of bat and the length must be 8after concatenation, the character string is batyzyzy after conversion.Expression: StringUtils.rightPad(value, 8, "yz")
39. If the field is of the string type, obtain the length of the current character string. If thecharacter string is null, 0 is returned.Expression: StringUtils.length(value)
40. If the field is of the string type, delete all the specified character strings from it. Forexample, delete ue from queued to obtain qd.Expression: StringUtils.remove(value, "ue")
41. If the field is of the string type, remove the substring at the end of the field. If thespecified substring is not at the end of the field, no conversion is performed. Forexample, remove .com at the end of www.domain.com.Expression: StringUtils.removeEnd(value, ".com")
42. If the field is of the string type, delete the substring at the beginning of the field. If thespecified substring is not at the beginning of the field, no conversion is performed. Forexample, delete www. at the beginning of www.domain.com.Expression: StringUtils.removeStart(value, "www.")
43. If the field is of the string type, replace all the specified character strings in the field. Forexample, replace a in aba with z to obtain zbz.Expression: StringUtils.replace(value, "a", "z")
44. If the field is of the string type, replace multiple characters in the character string at atime. For example, replace h in hello with j and o with y to obtain jelly.Expression: StringUtils.replaceChars(value, "ho", "jy")
45. If the field is of the string type, use the specified delimiter to split the text into arrays.For example, use : to split ab:cd:ef into ["ab", "cd", "ef"].Expression: StringUtils.split(value, ":")
46. If the string starts with the specified prefix (case sensitive), true is returned; otherwise,false is returned. For example, abcdef starts with abc, so that true is returned.Expression: StringUtils.startsWith(value, "abc")
47. If the field is of the string type, delete all the specified characters from the field. Forexample, delete all x, y, and z from abcyx to obtain abc.Expression: StringUtils.strip(value, "xyz")
48. If the field is of the string type, delete all the specified characters at the end of the field,for example, delete all spaces at the end of the field.Expression: StringUtils.stripEnd(value, null)
49. If the field is of the string type, delete all the specified characters at the beginning of thefield, for example, delete all spaces at the beginning of the field.Expression: StringUtils.stripStart(value, null)
50. If the field is of the string type, obtain the substring after the specified position(excluding the character at the specified position) of the character string. If the specifiedposition is a negative number, calculate the position in the descending order. Forexample, obtain the character string after the second character of abcde, that is, cde.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 201
Expression: StringUtils.substring(value, 2)51. If the field is of the string type, obtain the substring within the specified range of the
character string. If the specified range is a negative number, calculate the range in thedescending order. For example, obtain the character string between the second and fifthcharacters of abcde, that is, cd.Expression: StringUtils.substring(value, 2, 5)
52. If the field is of the string type, obtain the substring after the first specified character. Forexample, obtain the substring after the first b in abcba, that is, cba.Expression: StringUtils.substringAfter(value, "b")
53. If the field is of the string type, obtain the substring after the last specified character. Forexample, obtain the substring after the last b in abcba, that is, a.Expression: StringUtils.substringAfterLast(value, "b")
54. If the field is of the string type, obtain the substring before the first specified character.For example, obtain the substring before the first b in abcba, that is, a.Expression: StringUtils.substringBefore(value, "b")
55. If the field is of the string type, obtain the substring before the last specified character.For example, obtain the substring before the last b in abcba, that is, abc.Expression: StringUtils.substringBeforeLast(value, "b")
56. If the field is of the string type, obtain the substring nested within the specified string. Ifno substring is found, null is returned. For example, obtain the substring between tag intagabctag, that is, abc.Expression: StringUtils.substringBetween(value, "tag")
57. If the field is of the string type, delete the control characters (char ≤ 32) at both ends ofthe character string, for example, delete the spaces at both ends of the character string.Expression: StringUtils.trim(value)
58. Convert the character string to a value of the byte type. If the conversion fails, 0 isreturned.Expression: NumberUtils.toByte(value)
59. Convert the character string to a value of the byte type. If the conversion fails, thespecified value, for example, 1, is returned.Expression: NumberUtils.toByte(value, 1)
60. Convert the character string to a value of the double type. If the conversion fails, 0.0d isreturned.Expression: NumberUtils.toDouble(value)
61. Convert the character string to a value of the double type. If the conversion fails, thespecified value, for example, 1.1d, is returned.Expression: NumberUtils.toDouble(value, 1.1d)
62. Convert the character string to a value of the float type. If the conversion fails, 0.0f isreturned.Expression: NumberUtils.toFloat(value)
63. Convert the character string to a value of the float type. If the conversion fails, thespecified value, for example, 1.1f, is returned.Expression: NumberUtils.toFloat(value, 1.1f)
64. Convert the character string to a value of the int type. If the conversion fails, 0 isreturned.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 202
Expression: NumberUtils.toInt(value)65. Convert the character string to a value of the int type. If the conversion fails, the
specified value, for example, 1, is returned.Expression: NumberUtils.toInt(value, 1)
66. Convert the character string to a value of the long type. If the conversion fails, 0 isreturned.Expression: NumberUtils.toLong(value)
67. Convert the character string to a value of the long type. If the conversion fails, thespecified value, for example, 1L, is returned.Expression: NumberUtils.toLong(value, 1L)
68. Convert the character string to a value of the short type. If the conversion fails, 0 isreturned.Expression: NumberUtils.toShort(value)
69. Convert the character string to a value of the short type. If the conversion fails, thespecified value, for example, 1, is returned.Expression: NumberUtils.toShort(value, 1)
70. Convert the IP string to a value of the long type, for example, convert 10.78.124.0 to172915712.Expression: CommonUtils.ipToLong(value)
71. Read an IP address and physical address mapping file from the network, and downloadthe mapping file to the map collection. url indicates the address for storing the IPmapping file, for example, http://10.114.205.45:21203/sqoop/IpList.csv.Expression: HttpsUtils.downloadMap("url")
72. Cache the IP address and physical address mappings and specify a key for retrieval, forexample, ipList.Expression: CommonUtils.setCache("ipList",HttpsUtils.downloadMap("url"))
73. Obtain the cached IP address and physical address mappings.Expression: CommonUtils.getCache("ipList")
74. Check whether the IP address and physical address mappings are cached.Expression: CommonUtils.cacheExists("ipList")
75. Obtain the physical addresses corresponding to the IP address inCountry_Province_City_Carrier format. For example, the physical addresscorresponding to 1xx.78.124.0 is China_Guangdong_Shenzhen_China Telecom. If thecorresponding physical address cannot be obtained, the default value **_**_**_** isreturned. If necessary, you can use the StringUtil class expression to further split theaddresses.Expression:CommonUtils.getMapValue(CommonUtils.ipToLong(value),CommonUtils.cacheExists("ipLis")? CommonUtils.getCache("ipLis"):CommonUtils.setCache("ipLis",HttpsUtils.downloadMap("url")))
7.9 Migration of a List of FilesYou can migrate a list of files (a maximum of 50 files) from FTP, SFTP, NAS, OBS, OSS, orKODO at a time. The exported files can only be written to the same directory on themigration destination.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 203
When creating a table/file migration job, if the migration source is FTP, SFTP, NAS, OBS,OSS, or Qiniu Cloud Object Storage, Source Directory/File can contain a maximum of 50file names, which are separated by vertical bars (|). See Figure 7-14.
Figure 7-14 Migrating a list of files
NOTE
1. CDM supports incremental file migration (by skipping repeated files), but does not supportresumable transfer.
For example, if three files are to be migrated and the second file fails to be migrated due to thenetwork fault. When the migration task is started again, the first file is skipped. The second file,however, cannot be migrated from the point where the fault occurs, but can only be migrated again.
2. During file migration, a single task supports a maximum of 100,000 files. If there are too many filesin the directory to be migrated, you are advised to split the files into different directories and createmultiple tasks.
7.10 Using Regular Expressions to Separate Semi-structured Text
During table/file migration, CDM uses delimiters to separate fields in CSV files. However,delimiters cannot be used in complex semi-structured data because the field values alsocontain delimiters. In this case, the regular expression can be used to separate the fields.
Regular expression parameters are configured in source job parameters. Currently, CDMsupports OBS, Alibaba Cloud OSS, KODO, FTP, SFTP, and NAS. File Format must beCSV. See Figure 7-15.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 204
Figure 7-15 Setting regular expression parameters
When migrating CSV files, CDM uses the regular expression to separate fields. For detailsabout the syntax of the regular expression, refer to the related documents. This sectiondescribes the regular expressions of the following log files:l Log4J Logl Log4J Audit Logl Tomcat Logl Django Logl Apache Server Log
Log4J Logl Log sample:
2018-01-11 08:50:59,001 INFO [org.apache.sqoop.core.SqoopConfiguration.configureClassLoader(SqoopConfiguration.java:251)] Adding jars to current classloader from property: org.apache.sqoop.classpath.extra
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 205
l Regular expression:^(\d.*\d) (\w*) \[(.*)\] (\w.*).*
l Parsing result:
Log4J Audit Logl Log sample:
2018-01-11 08:51:06,156 INFO [org.apache.sqoop.audit.FileAuditLogger.logAuditEvent(FileAuditLogger.java:61)] user=sqoop.anonymous.user ip=189.xxx.xxx.75 op=show obj=version objId=
l Regular expression:^(\d.*\d) (\w*) \[(.*)\] user=(\w.*) ip=(\w.*) op=(\w.*) obj=(\w.*) objId=(.*).*
l Parsing result:
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 206
Tomcat Logl Log sample:
11-Jan-2018 09:00:06.907 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Name: Linux
l Regular expression:^(\d.*\d) (\w*) \[(.*)\] ([\w\.]*) (\w.*).*
l Parsing result:
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 207
Django Logl Log sample:
[08/Jan/2018 20:59:07 ] settings INFO Welcome to Hue 3.9.0
l Regular expression:^\[(.*)\] (\w*) (\w*) (.*).*
l Parsing result:
Apache Server Logl Log sample:
[Mon Jan 08 20:43:51.854334 2018] [mpm_event:notice] [pid 36465:tid 140557517657856] AH00489: Apache/2.4.12 (Unix) OpenSSL/1.0.1t configured -- resuming normal operations
l Regular expression:^\[(.*)\] \[(.*)\] \[(.*)\] (.*).*
l Parsing result:
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 208
7.11 GDS Import ModeWhen Creating a Link, you can set Import Mode to Copy or GDS for the DWS link. SeeFigure 7-16.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 209
Figure 7-16 Import Mode
Gauss Data Service (GDS) is a data service component provided by DWS. It implementshigh-speed data import by using the foreign table mechanism. The directions of networkcommunication in Copy and GDS modes are different.l In Copy mode, CDM pushes data to DWS.l When the GDS mode is used, CDM creates a foreign table temporarily. Multiple
DataNodes of DWS concurrently pull data from CDM. The data does not pass throughthe management node of DWS. Therefore, the migration speed is faster and theperformance is better.
The GDS component is built in CDM, so that you do not need to install the GDS toolkit. Thekey configurations for importing data to DWS in GDS mode are as follows (CDM currentlydoes not support data export in GDS mode):
1. Configure DWS to allow users of the DWS link to create and delete foreign tables.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 210
2. Configure the security group where the CDM cluster resides to allow the DWSDataNodes to access port 25000 of the internal IP address of the CDM cluster.
3. When creating a DWS link, set Import Mode to GDS.
4. Create a table/file migration job and set Destination Link Name to the DWS link withthe GDS mode enabled.
7.12 File FormatsWhen creating a CDM job, you need to specify File Format in the job parameters of themigration source and destination in some scenarios. This section describes the applicationscenarios, subparameters, common parameters, and usage examples of the supported fileformats.
l CSV
l JSON
l Binary
l Common parameters
l Solutions to File Format Problems
CSV
To read or write a CSV file, set File Format to CSV. The CSV format can used in thefollowing scenarios:
l Import files to a database or NoSQL.
l Export data from a database or NoSQL to files.
After selecting the CSV format, you can also configure the following optional sub-parameters:
1. Line Separator
2. Field Delimiter
3. Encoding Type
4. Use Quote Character
5. Use RE to Separate Fields
6. Use First Row as Header
7. File Size
1. Line Separator
Character used to separate lines in a CSV file. The value can be a single character,multiple characters, or special characters. Special characters can be entered using theURL encoded characters. The following table lists the URL encoded characters ofcommonly used special characters.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 211
Table 7-2 URL encoded characters of special characters
Special Character URL Encoded Character
Space %20
Tab %09
% %25
Enter %0d
Newline character %0a
Start of heading\u0001 (SOH) %01
2. Field Delimiter
Character used to separate columns in a CSV file. The value can be a single character,multiple characters, or special characters. For details, see Table 7-2.
3. Encoding TypeEncoding type of a CSV file. The default value is UTF-8. Some Chinese characters areencoded by GBK.If this parameter is specified at the migration source, the specified encoding type is usedto parse the file. If this parameter is specified at the migration destination, the specifiedencoding type is used to write data to the file.
4. Use Quote Character– Exporting data from a database or NoSQL to CSV files (configuring Use Quote
Character at the migration destination): If a field delimiter appears in the characterstring of a column of data at the migration source, set Use Quote Character to Yesat the migration destination to quote the character string as a whole and write it intothe CSV file. Currently, CDM uses double quotation marks (") as the quotecharacter only. As shown in the following figure, the value of the name field in thedatabase contains a comma (,).
If you do not use the quote character, the exported CSV file is displayed as follows:3.hello,world,abcIf you use the quote character, the exported CSV file is displayed as follows:3,"hello,world",abcIf the data in the database contains double quotation marks (") and you set UseQuote Character to Yes, the quote character in the exported CSV file is displayedas three double quotation marks ("""). For example, if the value of a field isa"hello,world"c, the exported data is as follows:"""a"hello,world"c"""
– Exporting CSV files to a database or NoSQL (configuring Use Quote Character atthe migration source): If you want to import the CSV files with quoted values to a
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 212
database correctly, set Use Quote Character to Yes at the migration source to writethe quoted values as a whole.
5. Use RE to Separate FieldsThis function is used to parse complex semi-structured text, such as log files. For details,see Using Regular Expressions to Separate Semi-structured Text.
6. Use First Row as HeaderThis parameter is used when CSV files are exported to other locations. If this parameteris specified at the migration source, CDM uses the first row as the header whenextracting data. When the CSV files are transferred, the headers are skipped. The numberof rows extracted from the migration source is more than the number of rows written tothe migration destination. The log files will output the information that the header isskipped during the migration.
7. File SizeThis parameter is used when data is exported from the database to a CSV file. If a tablecontains a large amount of data, a large CSV file is generated after migration, which isinconvenient to download or view. In this case, you can specify this parameter at themigration destination so that multiple CSV files with the specified size can be generated.The value of this parameter is an integer. The unit is MB.
JSON
The following describes information about the JSON format:l JSON Types Supported by CDMl JSON Reference Nodel Copying Data from a JSON File
1. JSON types supported by CDM: JSON object and JSON array– JSON object: A JSON file contains a single object or multiple objects separated/
merged by rows.
i. The following is a single JSON object:{ "took" : 190, "timed_out" : false, "total" : 1000001, "max_score" : 1.0 }
ii. The following are JSON objects separated by rows:{"took" : 188, "timed_out" : false, "total" : 1000003, "max_score" : 1.0 }{"took" : 189, "timed_out" : false, "total" : 1000004, "max_score" : 1.0 }
iii. The following are merged JSON objects:{ "took": 190, "timed_out": false, "total": 1000001, "max_score": 1.0 } { "took": 191, "timed_out": false, "total": 1000002, "max_score": 1.0 }
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 213
– JSON array: A JSON file is a JSON array consisting of multiple JSON objects.[{ "took" : 190, "timed_out" : false, "total" : 1000001, "max_score" : 1.0 }, { "took" : 191, "timed_out" : false, "total" : 1000001, "max_score" : 1.0 }]
2. JSON Reference NodeRoot node that records data. The data corresponding to the node is a JSON array. CDMextracts data from the array in the same mode. Multi-layer nested JSON nodes areseparated buy periods (.).
3. Copying Data from a JSON File
a. Example 1: Extract data from multiple objects that are separated or merged. AJSON file contains multiple JSON objects. The following gives an example: { "took": 190, "timed_out": false, "total": 1000001, "max_score": 1.0 } { "took": 191, "timed_out": false, "total": 1000002, "max_score": 1.0 } { "took": 192, "timed_out": false, "total": 1000003, "max_score": 1.0 }To extract data from the JSON objects and write them to the database in thefollowing formats, perform the following operations:
took timedOut total maxScore
190 false 1000001 1.0
191 false 1000002 1.0
192 false 1000003 1.0
Set File Format to JSON and JSON Type to JSON object, and then map fields.b. Example 2: Extract data from the reference node. A JSON file contains a single
JSON object, but the valid data is on a data node. The following gives an example:{ "took": 190, "timed_out": false, "hits": { "total": 1000001, "max_score": 1.0, "hits":
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 214
[{ "_id": "650612", "_source": { "name": "tom", "books": ["chinese","english","math"] } }, { "_id": "650616", "_source": { "name": "tom", "books": ["chinese","english","math"] } }, { "_id": "650618", "_source": { "name": "tom", "books": ["chinese","english","math"] } }] }}
To write the data to the database in the following formats, perform the followingoperations:
ID SourceName SourceBooks
650612 tom ["chinese","english","math"]
650616 tom ["chinese","english","math"]
650618 tom ["chinese","english","math"]
Set File Format to JSON, JSON Type to JSON object, and JSON ReferenceNode to hits.hits, and then map fields.
c. Example 3: Extract data from the JSON array. A JSON file is a JSON arrayconsisting of multiple JSON objects. The following gives an example:[{ "took" : 190, "timed_out" : false, "total" : 1000001, "max_score" : 1.0 }, { "took" : 191, "timed_out" : false, "total" : 1000002, "max_score" : 1.0 }]
To write the data to the database in the following formats, perform the followingoperations:
took timedOut total maxScore
190 false 1000001 1.0
191 false 1000002 1.0
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 215
Set File Format to JSON and JSON Type to JSON array, and then map fields.
d. Example 4: Configure a converter when parsing the JSON file. On the premise ofexample 2, to add the hits.max_score field to all records, that is, to write the datato the database in the following formats, perform the following operations:
ID SourceName SourceBooks MaxScore
650612 tom ["chinese","english","math"] 1.0
650616 tom ["chinese","english","math"] 1.0
650618 tom ["chinese","english","math"] 1.0
Set File Format to JSON, JSON Type to JSON object, and JSON ReferenceNode to hits.hits, and then create a converter.
i. Click Add Fields to add a field.
ii. Click the icon highlighted in the following figure to create a converter for thenew field.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 216
iii. Set Converter to Expression conversion, enter "1.0" in the Expression textbox, and click Save.
Binary
If you want to copy files between file systems, you can select the binary format. The binaryformat delivers the optimal rate and performance in file transfer, and does not require fieldmapping.
l Directory structure for file transferCDM can transfer a single file or all files in a directory at a time. After the files aretransferred to the migration destination, the directory structure remains unchanged.
l Migrating incremental filesWhen you use CDM to transfer files in binary format, configure Duplicate FileProcessing Method at the migration destination for incremental file migration. Fordetails, see Incremental File Migration.During incremental file migration, set Duplicate File Processing Method to Skip. Ifnew files exist at the migration source or a failure occurs during the migration, run thejob again, so that the migrated files will not be migrated repeatedly.
l Write to Temporary FileWhen migrating files in binary format, you can specify whether to write the files to atemporary file at the migration destination. If this parameter is specified, the file iswritten to a temporary file during file replication. After the file is successfully migrated,run the rename or move command to restore the file at the migration destination.
l Generate MD5 Hash ValueAn MD5 hash value is generated for each transferred file, and the value is recorded in anew .md5 file. You can specify the directory where the MD5 value is generated.
Common parametersl Source File Processing Method
After a file is copied successfully, CDM can perform operations on the source file. Theoptions are Rename and Delete.
l Start Job by Marker File
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 217
In automation scenarios, a scheduled task is configured on CDM to periodically readfiles from the migration source. However, files are being generated at the migrationsource. As a result, CDM reads data repeatedly or fails to read data from the migrationsource. You can specify the marker file for starting a job as ok.txt in the job parametersof the migration source. After the file is successfully generated at the migration source,the ok.txt file is generated in the file directory. In this way, CDM can read the completefile.In addition, you can set the suspension period. Within the suspension period, CDMperiodically queries whether the marker file exists. If the file does not exist after thesuspension period expires, the job fails.The marker file will not be migrated.
l Job Success Marker FileAfter data is successfully migrated to a file system, an empty file is generated in thedestination directory. You can specify the file name. Generally, this parameter is usedtogether with Start Job by Marker File.Note that the file cannot be confused with the file to be transferred. For example, if thefile to be transferred is finish.txt and the job success marker file is set to finish.txt, thetwo files will overwrite each other.
l FilterWhen using CDM to migrate files, you can specify a filter to filter files. You can chooseto filter the files by wildcard or regular expression. If you select regular expression, use aJava regular expression. CDM migrates only files that meet the filter conditions.For example, the /table/ directory stores a large number of data table directories, whichare divided by day. DRIVING_BEHAVIOR_20180101 toDRIVING_BEHAVIOR_20180630 store all data of DRIVING_BEHAVIOR fromJanuary to June. To migrate only the table data of DRIVING_BEHAVIOR in March, setSource Directory/File to /table, Filter Type to Wildcard, and Path Filter toDRIVING_BEHAVIOR_201803*.
Solutions to File Format Problems1. When data in a database is exported to a CSV file, if the data contains commas (,), the
data in the exported CSV file is disordered.The following solutions are available:
a. Specify a field delimiter.Use a character that does not exist in the database or a rare non-printable characteras the field delimiter. For example, set Field Delimiter at the migration destinationto %01. In this way, the exported field delimiter is \u0001. For details, see Table7-2.
b. Use the quote character.Set Use Quote Character to Yes at the migration destination. In this way, if thefield in the database contains the field delimiter, CDM quotes the field using thequote character and write the field as a whole to the CSV file.
2. The data in the database contains line separators.Scenario: When you use CDM to export a table in the MySQL database (a field valuecontains the line separator \n) to a CSV file, and then use CDM to import the exportedCSV file to MRS HBase, data in the exported CSV file is truncated.Solution: Specify a line separator.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 218
When you use CDM to export MySQL table data to a CSV file, set Line Separator atthe migration destination to %01 (ensure that the value does not appear in the fieldvalue). In this way, the line separator in the exported CSV file is %01. Then use CDM toimport the CSV file to MRS HBase. Set Line Separator at the migration source to %01.This avoids data truncation.
Cloud Data MigrationUser Guide 7 Advanced Operations
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 219
8 FAQs
8.1 What Are the Advantages of CDM?Data migration is involved when you consolidate or back up data, or develop new applicationson the public cloud. Generally, if you want to migrate data, you may develop data migrationscripts to read data from the source and write data to the destination. However, this methodhas the following disadvantages:l Because the data source types are different, the program uses different access interfaces,
such as JDBC and native APIs, to read and write data. In this case, various libraries andSDKs are required when you write data migration scripts, resulting in high developmentand management costs.
l During data migration, the read and write process is completed in one job. Limited byavailable resources, the performance is poor and cannot meet the requirements ofscenarios where massive sets of data need to be migrated.
l As the cloud computing technology develops, user data may be stored in differentenvironments, such as public clouds, on-premises or hosted Internet data centers (IDCs),and hybrid scenarios. In heterogeneous environments, data migration is subject tovarious factors, for example, network connectivity, which causes inconvenience fordevelopment and maintenance.
CDM is developed based on a distributed computing framework and leverages the paralleldata processing technology. It has the following advantages:l Ease of use: You can migrate data by configuring data sources and migration jobs on the
GUI and CDM will manage and maintain the data sources and migration jobs for you. Inother words, you only need to focus on the data migration logic without worrying aboutthe environment, which greatly reduces development and maintenance costs.
l High efficiency: Based on the distributed computing framework, CDM jobs are split intoindependent sub-jobs and executed concurrently, which drastically improves datamigration efficiency. In addition, efficient data import APIs are used to import data fromHive, HBase, DWS, and MySQL database.
l Support for various data sources: Data sources such as databases, Hadoop services,NoSQL databases, data warehouses, and files are supported.
l Support for multiple network environments: CDM helps you easily cope with variousdata migration scenarios, including data migration to the cloud, data exchange on thecloud, and data migration to on-premises service systems, regardless of whether the data
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 220
is stored on on-premises IDCs, third-party clouds (public cloud or private cloud),HUAWEI CLOUD services, or self-built databases or file systems using ECSs onHUAWEI CLOUD.
8.2 What Service Data Can Be Migrated by CDM?CDM can implement batch data migration between homogeneous and heterogeneous datasources, and supports other data sources such as on-premises file systems, file systems on thepublic cloud, relational databases, data warehouses, NoSQL databases, big data cloudservices, and object storage.
CDM supports table/file migration and entire DB migration:l Table/file migration: It is applicable to data migration to the cloud, data exchange on the
cloud, and data migration to on-premises service systems.l Entire DB migration: It is applicable to database migration to the cloud.
Table 8-1 describes the supported data sources.
Table 8-1 Supported data sources during table/file migration
Data SourceType
Data Source Used as aSource
Used as aDestination
Data warehouse Data Warehouse Service (DWS) Supported Supported
Data Lake Insight (DLI) Not supported Supported
FusionInsight LibrA Supported Supported
Hadoop MRS HDFS Supported Supported
MRS HBase Supported Supported
MRS Hive Supported Supported
FusionInsight HDFS Supported Supported
Apache HDFS Supported Supported
Hadoop HBase Supported Supported
FusionInsight HBase Supported Supported
Object storage Object Storage Service (OBS) Supported Supported
Alibaba Cloud Object StorageService (OSS)
Supported Not supported
Qiniu Cloud Object Storage Supported Not supported
File system FTP Supported Supported
SFTP Supported Supported
HTTP Supported Not supported
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 221
Data SourceType
Data Source Used as aSource
Used as aDestination
Network Attached Storage(NAS)
Supported Supported
Relational database RDS for MySQL Supported Supported
RDS for PostgreSQL Supported Supported
RDS for SQL Server Supported Supported
Distributed DatabaseMiddleware (DDM)
Supported Supported
MySQL Supported Supported
PostgreSQL Supported Not supported
Microsoft SQL Server Supported Not supported
Oracle Supported Not supported
IBM Db2 Supported Not supported
Derecho (GaussDB) Supported Not supported
NoSQL Distributed Cache Service(DCS)
Not supported Supported
Document Database Service(DDS)
Supported Supported
CloudTable Service(CloudTable)
Supported Supported
Redis Supported Not supported
MongoDB Supported Not supported
Search Cloud Search Service Supported Supported
Elasticsearch Supported Supported
Message system Data Ingestion Service (DIS) Supported(migrated toCloud SearchService only)
Not supported
Apache Kafka
NOTE
In the preceding table, the non-HUAWEI CLOUD data sources, such as MySQL, can be the MySQLbuilt in the local data center, created by users on Elastic Cloud Server (ECS), or on the third-party cloud.
Entire database migration is applicable to the scenario where an on-premises data center or adatabase created on the HUAWEI CLOUD ECS is synchronized to HUAWEI CLOUDdatabase services or big data services. It is suitable for offline database migration but not
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 222
online real-time migration. Figure 8-1 lists the data sources that support entire databasemigration using CDM.
Figure 8-1 Supported data sources in entire DB migration
8.3 What Security Protection Measures Are Used in CDM?CDM is a fully hosted service that provides the following capabilities to protect user datasecurity:
l Instance isolation: CDM users can use only their own instances. Instances are isolatedfrom each other and cannot access each other.
l System hardening: System hardening for security has been performed on the operatingsystem of the CDM instance, so attackers cannot access the operating system from theInternet.
l Key encryption: Keys of various data sources entered when users create links on CDMare stored in CDM databases using high-strength encryption algorithms.
l No intermediate storage: During data migration, CDM processes only data mapping andconversion without storing any user data or data fragments.
8.4 What is the Performance of Using CDM to MigrateData?
Theoretically, a single CDM instance allows 1 TB to 8 TB data to be migrated per day,depending on the network bandwidth and read and write performance of the data source.Different business departments, such as finance and online mall, can use different CDMinstances.
8.5 What Is the Most Economical Way to Migrate Datafrom the Public Network Using CDM?
1. If data is exported at a specified time every day, you can use the CDM shutdownfunction. The CDM cluster is started only when data is migrated. You are charged for astopped cluster for ¥0.05 per hour, that is, ¥1.2 per day, which is very favorable.
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 223
2. If the data on the public network is migrated, use the NAT gateway on HUAWEICLOUD to share the EIPs with other ECSs in the subnet. In this way, data on the on-premises data center or third-party cloud can be migrated in a more economical andconvenient manner.The following details the operations:
a. Suppose that you have created a CDM cluster (no dedicated EIP needs to be boundto the CDM cluster). Record the VPC and subnet where the CDM cluster is located.
b. Create a NAT gateway. Select the same VPC and subnet as the CDM cluster.c. After the NAT gateway is created, return to the NAT gateway console list, click the
created gateway name, and then click Add SNAT Rule.
Figure 8-2 Adding an SNAT rule
d. Select a subnet and an EIP. If no EIP is available, apply for one.Then, access the CDM management console to migrate data sources that areaccessed through the Internet to HUAWEI CLOUD. For example, migrate filesfrom the FTP server in the on-premises data center to OBS and migrate relationaldatabases from the third-party cloud to HUAWEI CLOUD RDS.
8.6 Does CDM Support Incremental Data Migration?CDM supports incremental data migration. With scheduled jobs and macro variables of dateand time, CDM provides incremental data migration in the following scenarios:l Both the data source and destination are file directories.l The data source is a file with the date and time field.l The data source is a relational database, and the database table name contains the date
and time field.l The data source is a relational database, and the database table contains a column that
stores the date field.
The following describes the key configurations of these scenarios.
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 224
Both Data Source and Destination Are File DirectoriesCDM supports binary transmission between files. When the source data is in a directory,CDM can import all files in the directory to the migration destination.
If files in the source directory are added irregularly, the key configurations for job creation areas follows:
1. Set Duplicate File Processing Method of the destination link to Skip. See Figure 8-3.
Figure 8-3 Skipping duplicated files
2. Configure scheduled job execution.
In this way, you can import the newly added files to the destination directory periodically toimplement incremental synchronization.
Data Source Is a File with the Date and Time FieldIf the source file name contains the date and time field, such as 2017-10-15 20:25:26,the /opt/data/file_20171015202526.data file is generated. The key configurations for jobcreation are as follows:
1. In source link parameters, set Filter Type to Wildcard. See Figure 8-4.
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 225
Figure 8-4 Filtering files
2. Enter *${dateformat(yyyyMMdd,-1,DAY)}* in File Filter. *${dateformat(yyyyMMdd,-1,DAY)}* is the macro variable format of date and timesupported by CDM. For details, see Incremental Synchronization Using the MacroVariables of Date and Time.
3. Select Schedule Execution and set Cycle to one day.
In this way, you can import the files generated in the previous day to the destination directoryevery day to implement incremental synchronization.
Data Source Is a Relational Database and Database Table Name Contains theDate and Time Field
The following uses the Oracle data table as the data source. A new data table is generated inthe data source every day and the table name contains the date and time field. For example, ifthe table is generated on October 15, 2017, the table name is table_20171015. The keyconfigurations for job creation are as follows:
1. In Source Job Configuration, set Table Name to table_${dateformat(yyyyMMdd)}.See Figure 8-5.
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 226
Figure 8-5 Definition of macro variables of date and time
2. Select Schedule Execution and set Cycle to one day.
In this way, the new database table can be imported to the destination every day.
Data Source Is a Relational Database and Database Table Contains a Columnthat Stores the Date Field
The following uses the MySQL database as the data source. The source table name is Data,and the DS field in Data indicates the date column. See Figure 8-6.
Figure 8-6 Data table
The key configurations for job creation are as follows:
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 227
1. In Source Job Configuration, set Where Clause to DS='${dateformat(yyyy-MM-dd,-1,DAY)}'. See Figure 8-7.
Figure 8-7 Configuring the macro variables of date and time using Where Clause
2. Select Schedule Execution and set Cycle to one day. The scheduled job is executed at00:00 every day.
In this way, the data generated in the previous day can be incrementally migrated to thedestination at 00:00 every day.
8.7 Can Fields Be Converted During Data Migration?Yes. CDM supports the following field converters:
l Anonymizationl Triml Reverse Stringl Replace Stringl Expression Conversion
You can create a field converter on the Map Field tab page when creating a table/filemigration job. See Figure 8-8.
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 228
Figure 8-8 Creating a field converter
AnonymizationThis converter is used to hide key information about the character string. For example, if youwant to convert 12345678910 to 123****8910, set parameters according to Figure 8-9.l Set Reserve Start Length to 3.l Set Reserve End Length to 4.l Set Replace Character to *.
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 229
Figure 8-9 Anonymization
Trim
This converter is used to automatically delete the spaces before and after a string. Noparameters need to be configured.
Reverse String
This converter is used to automatically reverse a string. For example, reverse ABC into CBA.No parameters need to be configured.
Replace String
This converter is used to replace a character string. You need to configure the object to bereplaced and the new value.
Expression Conversion
This converter uses the JSP expression language (EL) to convert the current field or a row ofdata. The JSP EL is used to create arithmetic and logical expressions. Within a JSP ELexpression, you can use integers, floating point numbers, strings, the built-in constants trueand false for boolean values, and null.
The expression supports the following environment variables:l value: indicates the current field value.l row: indicates the current row, which is an array type.
The expression supports the following tool classes:l StringUtils: string processing tool class. For details, see
org.apache.commons.lang.StringUtils of the Java SDK code.
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 230
l DateUtils: date tool classl CommonUtils: common tool classl NumberUtils: string-to-value conversion classl HttpsUtils: network file read class
Application examples:
1. Set a string constant for the current field, for example, VIP.Expression: "VIP"
2. If the field is of the string type, convert all character strings into lowercase letters, forexample, convert aBC to abc.Expression: StringUtils.lowerCase(value)
3. Convert all character strings of the current field to uppercase letters.Expression: StringUtils.upperCase(value)
4. If the field value is a date string in yyyy-MM-dd format, extract the year from the fieldvalue, for example, extract 2017 from 2017-12-01.Expression: StringUtils.substringBefore(value,"-")
5. If the field value is of the numeric type, convert the value to a new value which is twotimes greater than the original value:Expression: value*2
6. Convert the field value true to Y and other field values to N.Expression: value == "true"? "Y": "N"
7. If the field value is of the string type and is left empty, convert it to Default. Otherwise,the field value will not be converted.Expression: empty value? "Default" : value
8. If the first and second fields are of the numeric type, convert the field to the sum of thefirst and second field values.Expression: row[0] + row[1]
9. If the field is of the date or timestamp type, return the current year after conversion. Thedata type is int.Expression: DateUtils.getYear(value)
10. If the field is a date and time string in yyyy-MM-dd format, convert it to the date type:Expression: DateUtils.format(value,"yyyy-MM-dd")
11. Convert date format 2018/01/05 15:15:05 to 2018-01-05 15:15:05:Expression: DateUtils.format(DateUtils.parseDate(value,"yyyy/MM/ddHH:mm:ss"),"yyyy-MM-dd HH:mm:ss")
12. Obtain a 36-bit universally unique identifier (UUID):Expression: CommonUtils.randomUUID()
13. If the field is of the string type, capitalize the first letter, for example, convert cat to Cat.Expression: StringUtils.capitalize(value)
14. If the field is of the string type, convert the first letter to a lowercase letter, for example,convert Cat to cat.Expression: StringUtils.uncapitalize(value)
15. If the field is of the string type, use a space to fill in the character string to the specifiedlength and center the character string. If the length of the character string is not shorter
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 231
than the specified length, do not convert the character string. For example, convert ab toab to meet the specified length 4.
Expression: StringUtils.center(value, 4)16. Delete a newline (including \n, \r, and \r\n) at the end of a character string. For example,
convert abc\r\n\r\n to abc\r\n.
Expression: StringUtils.chomp(value)17. If the string contains the specified string, true is returned; otherwise, false is returned.
For example, abc contains a so that true is returned.
Expression: StringUtils.contains(value, "a")18. If the string contains any character of the specified string, true is returned; otherwise,
false is returned. For example, zzabyycdxx contains either z or a so that true is returned.
Expression: StringUtils.containsAny("value", "za")19. If the string does not contain any one of the specified characters, true is returned. If any
specified character is contained, false is returned. For example, abz contains onecharacter of xyz so that false is returned.
Expression: StringUtils.containsNone(value, "xyz")20. If the string contains only the specified characters, true is returned. If any other character
is contained, false is returned. For example, abab contains only characters among abc sothat true is returned.
Expression: StringUtils.containsOnly(value, "abc")21. If the character string is empty or null, convert it to the specified character string.
Otherwise, do not convert the character string. For example, convert the empty characterstring to null.
Expression: StringUtils.defaultIfEmpty(value, null)22. If the string ends with the specified suffix (case sensitive), true is returned; otherwise,
false is returned. For example, if the suffix of abcdef is not null, false is returned.
Expression: StringUtils.endsWith(value, null)23. If the string is the same as the specified string (case sensitive), true is returned;
otherwise, false is returned. For example, after strings abc and ABC are compared, falseis returned.
Expression: StringUtils.equals(value, "ABC")24. Obtain the first index of the specified character string in a character string. If no index is
found, -1 is returned. For example, the first index of ab in aabaabaa is 1.
Expression: StringUtils.indexOf(value, "ab")25. Obtain the last index of the specified character string in a character string. If no index is
found, -1 is returned. For example, the last index of k in aFkyk is 4.
Expression: StringUtils.lastIndexOf(value, "k")26. Obtain the first index of the specified character string from the position specified in the
character string. If no index is found, -1 is returned. For example, the first index of bobtained after the index 3 of aabaabaa is 5.
Expression: StringUtils.indexOf(value, "b", 3)27. Obtain the first index of any specified character in a character string. If no index is
found, -1 is returned. For example, the first index of z or a in zzabyycdxx. is 0.
Expression: StringUtils.indexOfAny(value, "za")
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 232
28. If the string contains any Unicode character, true is returned; otherwise, false isreturned. For example, ab2c contains only non-Unicode characters so that false isreturned.Expression: StringUtils.isAlpha(value)
29. If the string contains only Unicode characters and digits, true is returned; otherwise,false is returned. For example, ab2c contains only Unicode characters and digits, so thattrue is returned.Expression: StringUtils.isAlphanumeric(value)
30. If the string contains only Unicode characters, digits, and spaces, true is returned;otherwise, false is returned. For example, ab2c contains only Unicode characters anddigits so that true is returned.Expression: StringUtils.isAlphanumericSpace(value)
31. If the string contains only Unicode characters and spaces, true is returned; otherwise,false is returned. For example, ab2c contains Unicode characters and digits so that falseis returned.Expression: StringUtils.isAlphaSpace(value)
32. If the string contains only printable ASCII characters, true is returned; otherwise, falseis returned. For example, for !ab-c~, true is returned.Expression: StringUtils.isAsciiPrintable(value)
33. If the string is empty or null, true is returned; otherwise, false is returned.Expression: StringUtils.isEmpty(value)
34. If the string contains only Unicode digits, true is returned; otherwise, false is returned.Expression: StringUtils.isNumeric(value)
35. Obtain the leftmost characters of the specified length. For example, obtain the leftmosttwo characters ab from abc.Expression: StringUtils.left(value, 2)
36. Obtain the rightmost characters of the specified length. For example, obtain therightmost two characters bc from abc.Expression: StringUtils.right(value, 2)
37. Concatenate the specified character string to the left of the current character string andspecify the length of the concatenated character string. If the length of the currentcharacter string is not shorter than the specified length, the character string will not beconverted. For example, if yz is concatenated to the left of bat and the length must be 8after concatenation, the character string is yzyzybat after conversion.Expression: StringUtils.leftPad(value, 8, "yz")
38. Concatenate the specified character string to the right of the current character string andspecify the length of the concatenated character string. If the length of the currentcharacter string is not shorter than the specified length, the character string will not beconverted. For example, if yz is concatenated to the right of bat and the length must be 8after concatenation, the character string is batyzyzy after conversion.Expression: StringUtils.rightPad(value, 8, "yz")
39. If the field is of the string type, obtain the length of the current character string. If thecharacter string is null, 0 is returned.Expression: StringUtils.length(value)
40. If the field is of the string type, delete all the specified character strings from it. Forexample, delete ue from queued to obtain qd.
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 233
Expression: StringUtils.remove(value, "ue")41. If the field is of the string type, remove the substring at the end of the field. If the
specified substring is not at the end of the field, no conversion is performed. Forexample, remove .com at the end of www.domain.com.Expression: StringUtils.removeEnd(value, ".com")
42. If the field is of the string type, delete the substring at the beginning of the field. If thespecified substring is not at the beginning of the field, no conversion is performed. Forexample, delete www. at the beginning of www.domain.com.Expression: StringUtils.removeStart(value, "www.")
43. If the field is of the string type, replace all the specified character strings in the field. Forexample, replace a in aba with z to obtain zbz.Expression: StringUtils.replace(value, "a", "z")
44. If the field is of the string type, replace multiple characters in the character string at atime. For example, replace h in hello with j and o with y to obtain jelly.Expression: StringUtils.replaceChars(value, "ho", "jy")
45. If the field is of the string type, use the specified delimiter to split the text into arrays.For example, use : to split ab:cd:ef into ["ab", "cd", "ef"].Expression: StringUtils.split(value, ":")
46. If the string starts with the specified prefix (case sensitive), true is returned; otherwise,false is returned. For example, abcdef starts with abc, so that true is returned.Expression: StringUtils.startsWith(value, "abc")
47. If the field is of the string type, delete all the specified characters from the field. Forexample, delete all x, y, and z from abcyx to obtain abc.Expression: StringUtils.strip(value, "xyz")
48. If the field is of the string type, delete all the specified characters at the end of the field,for example, delete all spaces at the end of the field.Expression: StringUtils.stripEnd(value, null)
49. If the field is of the string type, delete all the specified characters at the beginning of thefield, for example, delete all spaces at the beginning of the field.Expression: StringUtils.stripStart(value, null)
50. If the field is of the string type, obtain the substring after the specified position(excluding the character at the specified position) of the character string. If the specifiedposition is a negative number, calculate the position in the descending order. Forexample, obtain the character string after the second character of abcde, that is, cde.Expression: StringUtils.substring(value, 2)
51. If the field is of the string type, obtain the substring within the specified range of thecharacter string. If the specified range is a negative number, calculate the range in thedescending order. For example, obtain the character string between the second and fifthcharacters of abcde, that is, cd.Expression: StringUtils.substring(value, 2, 5)
52. If the field is of the string type, obtain the substring after the first specified character. Forexample, obtain the substring after the first b in abcba, that is, cba.Expression: StringUtils.substringAfter(value, "b")
53. If the field is of the string type, obtain the substring after the last specified character. Forexample, obtain the substring after the last b in abcba, that is, a.
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 234
Expression: StringUtils.substringAfterLast(value, "b")54. If the field is of the string type, obtain the substring before the first specified character.
For example, obtain the substring before the first b in abcba, that is, a.Expression: StringUtils.substringBefore(value, "b")
55. If the field is of the string type, obtain the substring before the last specified character.For example, obtain the substring before the last b in abcba, that is, abc.Expression: StringUtils.substringBeforeLast(value, "b")
56. If the field is of the string type, obtain the substring nested within the specified string. Ifno substring is found, null is returned. For example, obtain the substring between tag intagabctag, that is, abc.Expression: StringUtils.substringBetween(value, "tag")
57. If the field is of the string type, delete the control characters (char ≤ 32) at both ends ofthe character string, for example, delete the spaces at both ends of the character string.Expression: StringUtils.trim(value)
58. Convert the character string to a value of the byte type. If the conversion fails, 0 isreturned.Expression: NumberUtils.toByte(value)
59. Convert the character string to a value of the byte type. If the conversion fails, thespecified value, for example, 1, is returned.Expression: NumberUtils.toByte(value, 1)
60. Convert the character string to a value of the double type. If the conversion fails, 0.0d isreturned.Expression: NumberUtils.toDouble(value)
61. Convert the character string to a value of the double type. If the conversion fails, thespecified value, for example, 1.1d, is returned.Expression: NumberUtils.toDouble(value, 1.1d)
62. Convert the character string to a value of the float type. If the conversion fails, 0.0f isreturned.Expression: NumberUtils.toFloat(value)
63. Convert the character string to a value of the float type. If the conversion fails, thespecified value, for example, 1.1f, is returned.Expression: NumberUtils.toFloat(value, 1.1f)
64. Convert the character string to a value of the int type. If the conversion fails, 0 isreturned.Expression: NumberUtils.toInt(value)
65. Convert the character string to a value of the int type. If the conversion fails, thespecified value, for example, 1, is returned.Expression: NumberUtils.toInt(value, 1)
66. Convert the character string to a value of the long type. If the conversion fails, 0 isreturned.Expression: NumberUtils.toLong(value)
67. Convert the character string to a value of the long type. If the conversion fails, thespecified value, for example, 1L, is returned.Expression: NumberUtils.toLong(value, 1L)
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 235
68. Convert the character string to a value of the short type. If the conversion fails, 0 isreturned.Expression: NumberUtils.toShort(value)
69. Convert the character string to a value of the short type. If the conversion fails, thespecified value, for example, 1, is returned.Expression: NumberUtils.toShort(value, 1)
70. Convert the IP string to a value of the long type, for example, convert 10.78.124.0 to172915712.Expression: CommonUtils.ipToLong(value)
71. Read an IP address and physical address mapping file from the network, and downloadthe mapping file to the map collection. url indicates the address for storing the IPmapping file, for example, http://10.114.205.45:21203/sqoop/IpList.csv.Expression: HttpsUtils.downloadMap("url")
72. Cache the IP address and physical address mappings and specify a key for retrieval, forexample, ipList.Expression: CommonUtils.setCache("ipList",HttpsUtils.downloadMap("url"))
73. Obtain the cached IP address and physical address mappings.Expression: CommonUtils.getCache("ipList")
74. Check whether the IP address and physical address mappings are cached.Expression: CommonUtils.cacheExists("ipList")
75. Obtain the physical addresses corresponding to the IP address inCountry_Province_City_Carrier format. For example, the physical addresscorresponding to 1xx.78.124.0 is China_Guangdong_Shenzhen_China Telecom. If thecorresponding physical address cannot be obtained, the default value **_**_**_** isreturned. If necessary, you can use the StringUtil class expression to further split theaddresses.Expression:CommonUtils.getMapValue(CommonUtils.ipToLong(value),CommonUtils.cacheExists("ipLis")? CommonUtils.getCache("ipLis"):CommonUtils.setCache("ipLis",HttpsUtils.downloadMap("url")))
8.8 What Data Formats Are Supported When the DataSource Is Hive?
CDM can read and write data in SequenceFile, TextFile, ORC, or Parquet format from theHive data source.
8.9 Does CDM Support Job Synchronization BetweenDifferent Clusters?
CDM does not support direct job migration across clusters. However, you can use the batchjob import/export function to indirectly implement cross-cluster migration as follows:
1. Export all jobs from CDM cluster 1 and save the jobs' JSON files to a local PC.For security purposes, no link password is exported when CDM exports jobs. Allpasswords are replaced by Add password here.
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 236
2. Edit each JSON file on the local PC by replacing Add password here with the actualpassword of the corresponding link.
3. Import the edited JSON files to CDM cluster 2 in batches to implement job migrationbetween cluster 1 and cluster 2.
For details about how to export and import data in batches, see Batch Managing Jobs.
8.10 Can I Create Jobs in Batches on CDM?CDM supports batch job creation with the help of the batch import function. You can createjobs in batches as follows.
1. Create a job manually.2. Export the job and save the job's JSON file to a local PC.3. Edit the JSON file and replicate more jobs in the JSON file according to the job
configuration.4. Import the JSON file to the CDM cluster to implement batch job creation.
For details about how to export and import data in batches, see Batch Managing Jobs.
8.11 Can I Back Up Jobs When the CDM Cluster Is NotUsed for a Long Time?
Yes. If you do not need to use the CDM cluster for a long time, you can stop or delete it toreduce costs.
Before the deletion, you can use the batch export function of CDM to save all job scripts to alocal PC. Then, you can create a cluster and import the jobs again when necessary.
8.12 How Do I Use Java to Invoke CDM RESTful APIs toCreate Data Migration Jobs?
CDM provides RESTful APIs to implement automatic job creation or execution control byprogram invocation.
The following describes how to use CDM to migrate data from table city1 in the MySQLdatabase to table city2 on DWS, and how to use Java to invoke CDM RESTful APIs to create,start, query, and delete a CDM job.
Prepare the following data in advance:
1. Obtain the username, account name, and project ID on HUAWEI CLOUD.On the CDM management console, hover the cursor on the username and select MyCredential from the drop-down list. On the page that is displayed, obtain the usernameand account name. In the project list, obtain the Project ID of the corresponding region,for example, 1af30ca47b5a4eb987e325a846458b7a.
2. Create a CDM cluster and obtain the cluster ID.
On the Cluster Management page, click on the left of the CDM cluster name toobtain the cluster ID, for example, c110beff-0f11-4e75-8b10-da7cd882b0ef.
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 237
3. Create a MySQL database and a DWS database, and create tables city1 and city2. Thestatements for creating tables are as follows:MySQL:create table city1(code varchar(10),name varchar(32));insert into city1 values('sz','Shenzhen');DWS:create table city2(code varchar(10),name varchar(32));
4. In the CDM cluster, create a link to MySQL, such as a link named mysqltestlink. Createa link to DWS, such as a link named dwstestlink.
5. Run the following code. You are advised to use the HttpClient package of version 4.5.Maven configuration is as follows:<project><modelVersion>4.0.0</modelVersion><groupId>cdm</groupId><artifactId>cdm-client</artifactId><version>1</version><dependencies><dependency><groupId>org.apache.httpcomponents</groupId><artifactId>httpclient</artifactId><version>4.5</version></dependency></dependencies></project>
Sample CodeThe code for using Java to invoke CDM RESTful APIs to create, start, query, and delete aCDM job is as follows:
package cdmclient;import java.io.IOException;import org.apache.http.Header;import org.apache.http.HttpEntity;import org.apache.http.HttpHost;import org.apache.http.auth.AuthScope;import org.apache.http.auth.UsernamePasswordCredentials;import org.apache.http.client.CredentialsProvider;import org.apache.http.client.config.RequestConfig;import org.apache.http.client.methods.CloseableHttpResponse;import org.apache.http.client.methods.HttpDelete;import org.apache.http.client.methods.HttpGet;import org.apache.http.client.methods.HttpPost;import org.apache.http.client.methods.HttpPut;import org.apache.http.entity.StringEntity;import org.apache.http.impl.client.BasicCredentialsProvider;import org.apache.http.impl.client.CloseableHttpClient;import org.apache.http.impl.client.HttpClients;import org.apache.http.util.EntityUtils;public class CdmClient {private final static String DOMAIN_NAME="HUAWEI CLOUD account name";private final static String USER_NAME="HUAWEI CLOUD username";private final static String USER_PASSWORD="HUAWEI CLOUD password";private final static String PROJECT_ID="Project ID";private final static String CLUSTER_ID="CDM Cluster ID";private final static String JOB_NAME="Job Name";private final static String FROM_LINKNAME="Source Link Name";private final static String TO_LINKNAME="Destination Link Name";private final static String IAM_ENDPOINT="iam.cn-north-1.myhuaweicloud.com";
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 238
private final static String CDM_ENDPOINT="cdm.cn-north-1.myhuaweicloud.com";private CloseableHttpClient httpclient;private String token;
public CdmClient() {this.httpclient = createHttpClient();this.token = login();}
private CloseableHttpClient createHttpClient() {CloseableHttpClient httpclient =HttpClients.createDefault();return httpclient;}
private String login(){HttpPost httpPost = new HttpPost("https://"+IAM_ENDPOINT+"/v3/auth/tokens");String json ="{\r\n"+"\"auth\": {\r\n"+"\"identity\": {\r\n"+"\"methods\": [\"password\"],\r\n"+"\"password\": {\r\n"+"\"user\": {\r\n"+"\"name\": \""+USER_NAME+"\",\r\n"+"\"password\": \""+USER_PASSWORD+"\",\r\n"+"\"domain\": {\r\n"+"\"name\": \""+DOMAIN_NAME+"\"\r\n"+"}\r\n"+"}\r\n"+"}\r\n"+"},\r\n"+"\"scope\": {\r\n"+"\"project\": {\r\n"+"\"name\": \"cn-north-1\"\r\n"+"}\r\n"+"}\r\n"+"}\r\n"+"}\r\n";try {StringEntity s = new StringEntity(json);s.setContentEncoding("UTF-8");s.setContentType("application/json");httpPost.setEntity(s);CloseableHttpResponse response = httpclient.execute(httpPost);Header tokenHeader = response.getFirstHeader("X-Subject-Token");String token = tokenHeader.getValue();System.out.println("Login successful");return token;} catch (Exception e) {throw new RuntimeException("login failed.", e);}}/*Create a job.*/
public void createJob(){HttpPost httpPost = new HttpPost("https://"+CDM_ENDPOINT+"/cdm/v1.0/"+PROJECT_ID+"/clusters/"+CLUSTER_ID+"/cdm/job");
/**The JSON information here is complex. You can create a job on the job management page, click Job JSON Definition next to the job, copy the
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 239
JSON content and convert it into a Java character string, and paste it here.*In the JSON message body, you only need to replace the link name, data import and export table names, field list of the tables, and fields used for partitioning in the source table.**/
String json ="{\r\n"+"\"jobs\": [\r\n"+"{\r\n"+"\"from-connector-name\": \"generic-jdbc-connector\",\r\n"+"\"name\": \""+JOB_NAME+"\",\r\n"+"\"to-connector-name\": \"generic-jdbc-connector\",\r\n"+"\"driver-config-values\": {\r\n"+"\"configs\": [\r\n"+"{\r\n"+"\"inputs\": [\r\n"+"{\r\n"+"\"name\": \"throttlingConfig.numExtractors\",\r\n"+"\"value\": \"1\"\r\n"+"}\r\n"+"],\r\n"+"\"validators\": [],\r\n"+"\"type\": \"JOB\",\r\n"+"\"id\": 30,\r\n"+"\"name\": \"throttlingConfig\"\r\n"+"}\r\n"+"]\r\n"+"},\r\n"+"\"from-link-name\": \""+FROM_LINKNAME+"\",\r\n"+"\"from-config-values\": {\r\n"+"\"configs\": [\r\n"+"{\r\n"+"\"inputs\": [\r\n"+"{\r\n"+"\"name\": \"fromJobConfig.schemaName\",\r\n"+"\"value\": \"sqoop\"\r\n"+"},\r\n"+"{\r\n"+"\"name\": \"fromJobConfig.tableName\",\r\n"+"\"value\": \"city1\"\r\n"+"},\r\n"+"{\r\n"+"\"name\": \"fromJobConfig.columnList\",\r\n"+"\"value\": \"code&name\"\r\n"+"},\r\n"+"{\r\n"+"\"name\": \"fromJobConfig.partitionColumn\",\r\n"+"\"value\": \"code\"\r\n"+"}\r\n"+"],\r\n"+"\"validators\": [],\r\n"+"\"type\": \"JOB\",\r\n"+"\"id\": 7,\r\n"+"\"name\": \"fromJobConfig\"\r\n"+"}\r\n"+"]\r\n"+"},\r\n"+"\"to-link-name\": \""+TO_LINKNAME+"\",\r\n"+"\"to-config-values\": {\r\n"+"\"configs\": [\r\n"+"{\r\n"+
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 240
"\"inputs\": [\r\n"+"{\r\n"+"\"name\": \"toJobConfig.schemaName\",\r\n"+"\"value\": \"sqoop\"\r\n"+"},\r\n"+"{\r\n"+"\"name\": \"toJobConfig.tableName\",\r\n"+"\"value\": \"city2\"\r\n"+"},\r\n"+"{\r\n"+"\"name\": \"toJobConfig.columnList\",\r\n"+"\"value\": \"code&name\"\r\n"+"}, \r\n"+"{\r\n"+"\"name\": \"toJobConfig.shouldClearTable\",\r\n"+"\"value\": \"true\"\r\n"+"}\r\n"+"],\r\n"+"\"validators\": [],\r\n"+"\"type\": \"JOB\",\r\n"+"\"id\": 9,\r\n"+"\"name\": \"toJobConfig\"\r\n"+"}\r\n"+"]\r\n"+"}\r\n"+"}\r\n"+"]\r\n"+"}\r\n";try {StringEntity s = new StringEntity(json);s.setContentEncoding("UTF-8");s.setContentType("application/json");httpPost.setEntity(s);httpPost.addHeader("X-Auth-Token", this.token);httpPost.addHeader("X-Language", "zh-cn");CloseableHttpResponse response = httpclient.execute(httpPost);int status = response.getStatusLine().getStatusCode();if(status == 200){System.out.println("Create job successful.");}else{System.out.println("Create job failed.");HttpEntity entity = response.getEntity();System.out.println(EntityUtils.toString(entity));}} catch (Exception e) {e.printStackTrace();throw new RuntimeException("Create job failed.", e);}}/*Start the job.*/
public void startJob(){HttpPut httpPut = new HttpPut("https://"+CDM_ENDPOINT+"/cdm/v1.0/"+PROJECT_ID+"/clusters/"+CLUSTER_ID+"/cdm/job/"+JOB_NAME+"/start");String json = "";try {StringEntity s = new StringEntity(json);s.setContentEncoding("UTF-8");s.setContentType("application/json");httpPut.setEntity(s);httpPut.addHeader("X-Auth-Token", this.token);httpPut.addHeader("X-Language", "zh-cn");
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 241
CloseableHttpResponse response = httpclient.execute(httpPut);int status = response.getStatusLine().getStatusCode();if(status == 200){System.out.println("Start job successful.");}else{System.out.println("Start job failed.");HttpEntity entity = response.getEntity();System.out.println(EntityUtils.toString(entity));}} catch (Exception e) {e.printStackTrace();throw new RuntimeException("Start job failed.", e);}}/*Query the job running status cyclically until the job is complete.*/
public void getJobStatus(){HttpGet httpGet = new HttpGet("https://"+CDM_ENDPOINT+"/cdm/v1.0/"+PROJECT_ID+"/clusters/"+CLUSTER_ID+"/cdm/job/"+JOB_NAME+"/status");try {httpGet.addHeader("X-Auth-Token", this.token);httpGet.addHeader("X-Language", "zh-cn");boolean flag = true;while(flag){CloseableHttpResponse response = httpclient.execute(httpGet);int status = response.getStatusLine().getStatusCode();if(status == 200){HttpEntity entity = response.getEntity();String msg = EntityUtils.toString(entity);if(msg.contains("\"status\":\"SUCCEEDED\"")){System.out.println("Job succeeded");break;}else if (msg.contains("\"status\":\"FAILED\"")){System.out.println("Job failed.");break;}else{Thread.sleep(1000);}
}else{System.out.println("Get job status failed.");HttpEntity entity = response.getEntity();System.out.println(EntityUtils.toString(entity));break;}}} catch (Exception e) {e.printStackTrace();throw new RuntimeException("Get job status failed.", e);}}/*Delete the job.*/
public void deleteJob(){HttpDelete httpDelte = new HttpDelete("https://"+CDM_ENDPOINT+"/cdm/v1.0/"+PROJECT_ID+"/clusters/"+CLUSTER_ID+"/cdm/job/"+JOB_NAME);try {httpDelte.addHeader("X-Auth-Token", this.token);httpDelte.addHeader("X-Language", "zh-cn");CloseableHttpResponse response = httpclient.execute(httpDelte);int status = response.getStatusLine().getStatusCode();
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 242
if(status == 200){System.out.println("Delete job successful.");}else{System.out.println("Delete job failed.");HttpEntity entity = response.getEntity();System.out.println(EntityUtils.toString(entity));}} catch (Exception e) {e.printStackTrace();throw new RuntimeException("Delete job failed.", e);}}/*Close the process.*/
public void close(){try {httpclient.close();} catch (IOException e) {throw new RuntimeException("Close failed.", e);}}
public static void main(String[] args){CdmClient cdmClient = new CdmClient();cdmClient.createJob();cdmClient.startJob();cdmClient.getJobStatus();cdmClient.deleteJob();cdmClient.close();}}
8.13 How Do I Connect On-premises Intranet or Third-Party Private Network to CDM?
Many enterprises deploy key data sources on the intranet, such as databases and file servers.CDM runs on HUAWEI CLOUD. To migrate the intranet data to HUAWEI CLOUD usingCDM, use any of the following methods to connect the intranet to HUAWEI CLOUD:
1. Bind the Internet IP addresses to the intranet data source nodes to enable CDM to accessthe data from the Internet directly.
2. Establish a VPN between the on-premises data center and the VPC where the serviceresides on HUAWEI CLOUD.For details about VPN on HUAWEI CLOUD, see http://www.huaweicloud.com/en-us/product/vpn.html.
3. Use Direct Connect to connect the data center to HUAWEI CLOUD.For details about Direct Connect on HUAWEI CLOUD, see http://www.huaweicloud.com/en-us/product/dc.html.
4. Leverage Network Address Translation (NAT) or port forwarding to access the networkin proxy mode.
The following describes how to use the port forwarding tool to access intranet data. Theprocess is as follows:
1. Use a Windows computer as the gateway. The computer must be able to access both theInternet and the intranet.
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 243
2. Install the port mapping tool IPOP on the computer.3. Configure port mapping using the tool.
NOTICEIf the intranet database is exposed to the public network for a long time, security risks exist.Therefore, after data migration is complete, stop port mapping.
Scenario
Suppose that the MySQL database on the intranet is migrated to DWS on HUAWEI CLOUD.Figure 8-10 shows a network topology example.
In the figure, the intranet can be either an enterprise's data center or the intranet of the virtualdata center on a third-party cloud.
Figure 8-10 Network topology example
Procedure
Step 1 Use a Windows computer as the gateway. Configure both the intranet and Internet IPaddresses on the computer. Conduct the following test to check whether the gatewaycomputer can fulfill service needs.
1. Run the ping command on the computer to check whether the intranet address of theMySQL database is pingable. For example, run ping 192.168.1.8.
2. Run the ping command on another computer that can access the Internet to checkwhether the public network address of the gateway computer is pingable. For example,run ping 202.xxx.xxx.10.
Step 2 Download the port mapping tool IPOP and install it on the gateway computer.
Step 3 Run the port mapping tool and select PORT Map. See Figure 8-11.l Local IP and Local Port: Configure these two parameters to the public network address
and port number of the gateway computer respectively, which must be entered whencreating MySQL links on CDM.
l Mapping IP and Map Port: Configure these two parameters to the IP address and portnumber of the MySQL database on the intranet.
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 244
Figure 8-11 Configuring port mapping
Step 4 Click ADD to add a port mapping relationship.
Step 5 Click START to start mapping and receive data packets.
Then, you can use the EIP to read data from the MySQL database on the intranet on CDM andimport the data to DWS on HUAWEI CLOUD.
NOTE
1. To access the on-premises data source, you must also bind an EIP to the CDM cluster.
2. Generally, DWS on HUAWEI CLOUD can be accessed only within the VPC. When creating a CDMcluster, you must ensure that the VPC of the CDM cluster must be the same as that of DWS. Inaddition, it is recommended that CDM and DWS be in the same intranet and security group. If theirsecurity groups are different, you also need to enable data access between the security groups.
3. Port mapping can be used to migrate data between databases on the intranet or the SFTP servers.
4. For Linux computers, port mapping can also be implemented using IPTABLE.
5. When the FTP server on the intranet is mapped to the public network using port mapping, you needto check whether the PASV mode is enabled. In this case, the client and server are connectedthrough a random port. Therefore, in addition to port 21 mapping, you also need to configure theport range mapping in PASV mode. For example, you can specify the vsftp port range byconfiguring pasv_min_port and pasv_max_port.
----End
8.14 What Do I Do If the System Displays a MessageIndicating that the Date Format Fails to Be Parsed WhenData Is Imported to Cloud Search Service?
Symptom
When CDM is used to migrate other data sources to Cloud Search Service, the job fails to beexecuted and the error message "Unparseable date" is displayed in the log. See Figure 8-12.
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 245
Figure 8-12 Log output
Possible Cause
Cloud Search Service has a special processing mechanism on the time field. If the stored timedata does not contain the time zone information, Kibana considers the time as the GMT andautomatically converts the time to the local time.
In China, the displayed time is eight hours earlier than the actual time. Therefore, when CDMmigrates data to Cloud Search Service, if the index and type are automatically created byCDM (for example, if date_test and test1 of the migration destination highlighted in Figure8-13 do not exist in Cloud Search Service, CDM automatically creates the index and type inCloud Search Service), CDM, by default, sets the format of the time field to the standardformat of yyyy-MM-dd HH:mm:ss.SSS Z, for example, 2018-01-08 08:08:08.666 +0800.
Figure 8-13 Job configuration
When data is imported from another data source to Cloud Search Service, if the date format inthe source data is not the standard format, for example, 2018/01/05 15:15:46, the CDM jobfails to be executed, and the log shows that the date format cannot be parsed. You need toconfigure a field converter on CDM to convert the format of the date field to the requiredformat of Cloud Search Service.
Solution1. Edit the job and go to the Map Field tab page. Click the icon for creating a converter in
the row of the source field to create a converter. See Figure 8-14.
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 246
Figure 8-14 Creating a converter
2. Select Expression conversion as the converter. Currently, expression conversionsupports functions of the character string and date types. The syntax is similar to the Javacharacter string and time functions. For details about how to compile the expression, seeField Conversion During Migration.
3. In this example, the source time format is yyyy/MM/dd HH:mm:ss. To convert the sourcetime format to yyyy-MM-dd HH:mm:ss.SSS Z, perform the following operations:
a. Add the time zone information +0800 to the end of the original date characterstring. The corresponding expression is value+" +0800".
b. Use the original date format to parse the string to a date object. You can use theDateUtils.parseDate function for parsing. The syntax isDateUtils.parseDate(String value, String format).
c. Format the date object into a character string in target format by using theDateUtils.format function. The syntax is DateUtils.format(Date date, Stringformat).
In this example, the complete expression isDateUtils.format(DateUtils.parseDate(value+" +0800","yyyy/MM/dd HH:mm:ssZ"),"yyyy-MM-dd HH:mm:ss.SSS Z"). See Figure 8-15.
Figure 8-15 Configuring the expression
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 247
4. Save the converter configuration and save and run the job to solve the problem thatCloud Search Service fails to parse the date format.
8.15 What Do I Do If the Map Field Tab Page CannotDisplay All Columns When Data Is Exported from HBase/CloudTable?
Symptom
When data is exported from HBase/CloudTable using CDM, fields in the HBase/CloudTabletable on the Map Field tab page occasionally cannot be displayed completely and cannotmatch the fields on the migration destination. As a result, the data imported to the migrationdestination is incomplete.
Possible Cause
HBase/CloudTable are schema-less, and the number of columns in each data is not fixed. Onthe Map Field page, there is a high probability that all columns cannot be obtained byobtaining example values. In this case, the data on the migration destination is incompleteafter the job is executed.
To solve this problem, perform any of the following methods:
1. Add fields on the Map Field tab page.2. Edit the JSON file of the job on the Job Management page (modify the
fromJobConfig.columns and toJobConfig.columnList parameters).3. Export the JSON file of the job to the local PC, modify the parameters in the JSON file
(the principle is the same to that in 2), and then import the JSON file back to CDM.
You are advised to perform 1. The following uses data migration from HBase to DWS as anexample.
Solution 1: Adding Fields on the Map Field Tab Page1. Obtain all fields in the tables to be migrated from source HBase. Use colons (:) to
separate column families and columns. The following gives an example.rowkey:rowkeyg:DAY_COUNTg:CATEGORY_IDg:CATEGORY_NAMEg:FIND_TIMEg:UPLOAD_PEOPLEg:IDg:INFOMATION_IDg:TITLEg:COORDINATE_Xg:COORDINATE_Yg:COORDINATE_Zg:CONTENTg:IMAGESg:STATE
2. On the Job Management page, locate the job for exporting data from HBase to DWS,click Edit in the row where the job resides, and go to the Map Field tab page. SeeFigure 8-16.
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 248
Figure 8-16 Field mapping
3. Click . In the dialog box that is displayed, select Add a new field. See Figure 8-17.
Figure 8-17 Adding a field
NOTE
After a field is added, the example value of the new field is not displayed on the console. Thisdoes not affect the transmission of field values. CDM directly writes the field values to themigration destination.
4. After all fields are added, check whether the mapping between the migration source anddestination is correct. If the mapping is incorrect, drag the fields to adjust the fieldmapping.
5. Click Next and Save.
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 249
Solution 2: Modifying a JSON File1. Obtain all fields in the tables to be migrated from source HBase. Use colons (:) to
separate column families and columns. The following gives an example.rowkey:rowkeyg:DAY_COUNTg:CATEGORY_IDg:CATEGORY_NAMEg:FIND_TIMEg:UPLOAD_PEOPLEg:IDg:INFOMATION_IDg:TITLEg:COORDINATE_Xg:COORDINATE_Yg:COORDINATE_Zg:CONTENTg:IMAGESg:STATE
2. In the DWS destination table, obtain the fields corresponding to the HBase table fields.If any field name corresponding to the HBase field does not exist in the DWS destinationtable, add it to the DWS table schema. Suppose that the fields in the DWS table arecomplete and are displayed as follows:rowkeyday_countcategorycategory_namefind_timeupload_peopleidinfomation_idtitlecoordinate_xcoordinate_ycoordinate_zcontentimagesstate
3. On the Job Management page, locate the job for exporting data from HBase to DWS,and choose More > Edit Job JSON in the row where the job resides.
4. On the page that is displayed, edit the JSON file of the job.
a. Modify the fromJobConfig.columns parameter of the migration source to theHBase fields obtained in 1. Use & to separate column numbers and colons (:) toseparate column families and columns. The following gives an example:"from-config-values": { "configs": [ { "inputs": [ { "name": "fromJobConfig.table", "value": "HBase" }, { "name": "fromJobConfig.columns", "value": "rowkey:rowkey&g:DAY_COUNT&g:CATEGORY_ID&g:CATEGORY_NAME&g:FIND_TIME&g:UPLOAD_PEOPLE&g:ID&g:INFOMATION_ID&g:TITLE&g:COORDINATE_X&g:COORDINATE_Y&g:COORDINATE_Z&g:CONTENT&g:IMAGES&g:STATE" }, { "name": "fromJobConfig.formats", "value": { "2": "yyyy-MM-dd", "undefined": "yyyy-MM-dd"
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 250
} } ], "name": "fromJobConfig" } ] }
b. Modify the toJobConfig.columnList parameter of the migration source to the fieldlist of DWS obtained in 2.The sequence must be the same as that of HBase to ensure correct field mapping.Use & to separate field names. The following gives an example:"to-config-values": { "configs": [ { "inputs": [ { "name": "toJobConfig.schemaName", "value": "dbadmin" }, { "name": "toJobConfig.tablePreparation", "value": "DO_NOTHING" }, { "name": "toJobConfig.tableName", "value": "DWS " }, { "name": "toJobConfig.columnList", "value": "rowkey&day_count&category&category_name&find_time&upload_people&id&infomation_id&title&coordinate_x&coordinate_y&coordinate_z&content&images&state" }, { "name": "toJobConfig.shouldClearTable", "value": "true" } ], "name": "toJobConfig" } ] }
c. Retain the settings of other parameters, and then click Save and Run.5. After the job is completed, check whether the data in the DWS table matches the data in
HBase. If the mapping is incorrect, check whether the sequences of the HBase and DWSfields in the JSON file are the same.
8.16 How Do I Select Distribution Columns When UsingCDM to Migrate Data to DWS?
When using CDM to migrate data to DWS/FusionInsight LibrA and create a table on DWS,select the distribution columns during job configuration. See Figure 8-18.
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 251
Figure 8-18 Selecting distribution columns
Selecting the distribution column is very important for the running of DWS/FusionInsightLibrA. When migrating data to DWS/FusionInsight LibrA, you are advised to specify thedistribution column according to the following principles:
1. Use the primary key as the distribution column.2. If multiple data segments are combined as primary keys, specify all primary keys as the
distribution column.3. In the scenario where no primary key is available, if no distribution column is selected,
DWS uses the first column as the distribution column by default. As a result, data skewrisks exist.
Therefore, when a single table or entire database is imported to DWS/FusionInsight LibrA,you are advised to manually select a distribution column; otherwise, CDM automaticallyselects one. For more information about the distribution column, see Selecting a DistributionColumn. in DWS
If the DWS primary key or table contains only one field, the field type must be a commoncharacter string, value, or date. When data is migrated from another database to DWS, ifautomatic table creation is selected, the primary key must be of the following types. If noprimary key is set, at least one of the following fields must be set. Otherwise, the table cannotbe created and the CDM job fails.
l INTEGER TYPES: TINYINT, SMALLINT, INT, BIGINT, NUMERIC/DECIMALl CHARACTER TYPES: CHAR, BPCHAR, VARCHAR, VARCHAR2, NVARCHAR2,
TEXTl DATA/TIME TYPES: DATE, TIME, TIMETZ, TIMESTAMP, TIMESTAMPTZ,
INTERVAL, SMALLDATETIME
8.17 What Do I Do If the Error Message "value too long fortype character varying" Is Displayed When I Migrate Datato DWS?
SymptomWhen you use CDM to migrate data to DWS/FusionInsight LibrA, the migration fails and theerror message "value too long for type character varying" is displayed in the log. SeeFigure 8-19.
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 252
Figure 8-19 Log output
Possible CauseThe data migrated to DWS is in Chinese, and the table is automatically created at themigration destination. The length of the varchar field of DWS is calculated by byte, and aChinese character may occupy three bytes in UTF-8 encoding. If the length of a Chinesecharacter exceeds that of the varchar field of DWS, an error occurs and the error message"value too long for type character varying" is displayed.
SolutionTo solve this problem, you can select Extend Field Length to Yes, so that the length of thevarchar field is automatically increased by three times when the destination table is created.
Edit the table/file migration job on CDM. In Destination Job Configuration, set Auto TableCreation to Auto creation, Extend Field Length is displayed in Show AdvancedAttributes. Set Extend Field Length to Yes. See Figure 8-20.
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 253
Figure 8-20 Extending field length
Cloud Data MigrationUser Guide 8 FAQs
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 254
A Version Updates
2018.8.3 1.5.0 Versionl New Functions
a. Support for the cdm.xlarge cluster requiring 10GE bandwidth
b. Support for streaming JSON parse to reduce resource usage
c. Support for region switch on the CDM service purchase page to improve usability
d. Support for link connectivity test to improve usability
e. Support for source and destination table comparison after the migration is complete
f. Support for incremental data migration in MySQL Binlog mode (trial use)
l Fixed Bugs
a. Failures of data migration from DIS to Cloud Search Service if Offset is set to Laststop
b. Failures of job save when data is migrated from the Oracle database to DWS andthe source table contains more than 800 columns
c. Failures of setting field delimiter to \001 when exporting a CSV file
d. Failures of job execution when data is migrated from MySQL to DWS with autotable creation enabled and the source field is configured with the NOT NULLconstraints
2018.7.5 1.3.0 Versionl New Functions
a. Support for HDFS data migration between multiple MRS clusters
b. Support for data partition by size during data export to OBS
c. Support for data filtering using filter conditions during data migration fromElasticsearch/Cloud Search Service. This function can be used in incremental datamigration scenarios.
d. Support for object migration from Qiniu Cloud Object Storage to OBS
e. Support for the job statistics of running clusters being displayed on the CDMconsole
l Fixed Bugs
Cloud Data MigrationUser Guide A Version Updates
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 255
a. Changed the maximum length of a job name from 32 characters to 256 characters.b. Empty directories are not migrated to the migration destination.c. Special characters cannot be used as field delimiters.d. The default value of the time field at the migration source is not used when the
MySQL database automatically creates a table.
2018.6.2 1.2.0 Versionl New Functions
a. Support for data export from HTTP/HTTPS data sources to HUAWEI CLOUDb. Support for data to be exported in CarbonData format and stored in OBSc. Support for scheduled start and stop of clusters and automatic shutdown, helping
you reduce costsd. Support for automatic mapping of fields with the same namee. Support for the process wizardf. Support for the retry policy of migration jobsg. During MySQL link creation, local APIs can be automatically detected and enabled.h. Supported for the pipeline and authentication parameters being configured for the
Elasticsearch data sourcei. When the job displayed on the Job Management page fails, you can hover the
cursor on the job to see the failure cause.l Fixed Bugs
a. The monitoring data cannot be correctly displayed when a cluster is created for thefirst time.
b. Data cannot be imported to the Hive partitioned table.c. Buckets cannot be listed when OBS buckets of various regions exist.d. The expression carrying % cannot be written in the where clause in the database.
2018.5.4 1.1.0 Versionl New Functions
a. Support for data export and import of DDMb. Support for data export and import of Hadoop HBase and FusionInsight HBasec. Support for data export and import of FusionInsight LibrAd. Support for data migration from the MongoDB database to DDSe. Support for entire database migration to OBSf. Optimized the method of adjusting field mapping, making it easier to use.g. The JSON definition of a job can be edited, which is suitable for advanced users.h. Data can be imported to DWS in GDS mode, which greatly improves the
performance of importing data to DWS.i. Support for column- or row-based storage, as well as compressed storage, in DWS
table creationj. Support for the advanced attribute of deleting data successful import. This attribute
is designed for massive one-time jobs.
Cloud Data MigrationUser Guide A Version Updates
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 256
k. Support for the encircling symbol being configured for the CSV files in filemigration
l. Support for migration of files in ZIP formatm. Support for field converter test. That is, the conversion effect is displayed
immediately.n. Optimized the performance of executing a large number of migration jobs.o. Support for numeral fields being used as incremental fields in database migrations,
making incremental migration more convenientp. Support for databases being migrated in transaction mode by specifying a staging
tablel Fixed Bugs
a. Common error messagesb. NoSQL example values cannot contain all fields by adding the function of manually
adding new fields.c. SocketTimeout may occur when data is migrated from MongoDB.d. Poor performance of writing small files to OBSe. Poor performance of executing multiple concurrent jobs
2018.3.28 1.0.T11 Versionl New Functions
a. Support for data import to DLIb. Support for object migration from Alibaba Cloud OSSc. Support for KMS encryption when data is written to OBSd. Support for MD5 verification to ensure data consistency when data is written to
OBSe. Support for obtaining the MRS, DWS, and RDS instance lists during link creationf. Support for HUAWEI CLOUD second-generation VMs, speeding up network
accessg. Support for automatic schema creation during entire database migrationh. Accelerated the speed of creating a cluster for the first time. The creation is
complete within one minute.i. Added the expression converter to support more string, date, and numeric
processing functions.j. Support for the cdm.small clusters to reduce costsk. The cluster can be started or stopped based on service requirements.l. In file migrations, the total number of files and total data volume are displayed.m. You can view the monitoring metrics of the CDM cluster on the Cloud Eye console,
for example, data traffic.n. In CloudTable data migration, the time range and column families of data can be
specified.l Fixed Bugs
a. Inaccurate statistics about data written to DWSb. DCS link failure
Cloud Data MigrationUser Guide A Version Updates
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 257
c. JSON files cannot be exported to CSV files.d. Data import fails because some database field names contain spaces.
2018.1.31 1.0.T10 Versionl New Functions
a. Support for wizard-based link creationb. Support for specifying the tables to be migrated during entire database migrationc. Support for data migration from Cloud Search Serviced. Support for data export and import of FusionInsight HDFSe. Support for regular expressions being used to parse logsf. Support for the time format conversion function and the random number function of
the field expression converterg. Support for reading files in GZIP formath. Support for reading files in Parquet format on HDFSi. Support for splitting rowkeys when data is migrated from HBase/CloudTablej. Support for determining whether to compress a HBase/CloudTable table during job
creationk. The new endpoint cdm.cn-north-1. myhuaweicloud.com is used.l. Support for data export from Derecho (GaussDB)m. Support for deleting the header row of a CSV file
l Fixed Bugs
a. Low performance of writing data to OBSb. Inconsistency between the entire database migration job status and the sub-job
statusc. Insufficient fields during table migrationd. EIPs cannot be deleted from CDM after being released in the VPC.e. Cloud Search Service does not support date fields.
2018.1.9 1.0.T9 Versionl New Functions
a. Support for entire homogeneous relational database migration. You can migrate on-premises MySQL, PostgreSQL, and Microsoft SQL Server databases to RDS forMySQL, RDS for PostgreSQL, and RDS for SQL Server on HUAWEI CLOUD.This function is applicable to database migration to RDS on HUAWEI CLOUD. Itsupports entire database migration but does not support real-time incrementalsynchronization.
b. Support for entire heterogeneous relational database migration. You can migrate theon-premises Oracle, Db2, MySQL, PostgreSQL, or Microsoft SQL Server databaseto any database of RDS for MySQL, RDS for PostgreSQL, RDS for SQL Server,DWS, and MRS Hive.
c. Support for entire NoSQL database migration. You can migrate on-premises Redisand Elasticsearch to DCS and Cloud Search Service on HUAWEI CLOUD.
d. Support for automatic creation of the destination table during data import to adatabase
Cloud Data MigrationUser Guide A Version Updates
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 258
e. Support for migration from open source Hadoop to MRS on HUAWEI CLOUD andnon-security mode
f. Support for interconnection with CloudTable on HUAWEI CLOUD and opensource Kafka
g. Support for parsing source data files in JSON format
h. Support for connecting to RDS databases in SSL mode
i. Support for filtering jobs by status and scheduled execution
j. Support for writing a temporary name during file migration
k. Support for setting a specific file as the boot condition for file migration jobs, forexample, OK.txt
l. Support for batch link deletion
l Fixed Bugs
a. Changed the storage duration of historical records to 90 days.
b. Long database link timeout period
c. Changed names of the enumerated values to ones that are easier to understand.
d. Incorrect sorting of historical operation records displayed on multiple pages
e. Failures of creating jobs when a table contains a large number of fields
2017.11.30 1.0.T8 Versionl New Functions
a. Support for the Elasticsearch/Cloud Search Service links. Data in the database canbe imported to the Elasticsearch server and Cloud Search Service.
b. Support for the DIS links. Data can be obtained from DIS.
c. Support for the NAS links, CIFS/SMB protocol, interconnection with professionalfile servers, Windows system file sharing, Linux Samba servers, and file systemcloud services that provide the CIFS/SMB protocol
d. Support for binding or unbinding an EIP after a cluster is created
e. Support for configuring field conversion and processing field values duringmigration
f. Optimized the Job Management page so that the job progress can be displayed in amore timely and accurate manner and jobs can be sorted by specific field.
g. Support for displaying historical records and links on multiple pages
h. Support for detecting duplicate files based on the file size during incremental filesynchronization
i. Support for scheduled job execution (weekly)
l Fixed Bugs
a. Incorrect database passwords due to special characters
b. Incorrect default date format of job mapping
c. Invalid advanced link parameters
d. Batch job import timeout
Cloud Data MigrationUser Guide A Version Updates
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 259
2017.10.31 1.0.T5 Versionl New Functions
a. Support for directory browsing when selecting FTP, SFTP, HDFS, and OBS pathsb. Support for overwriting or skipping files with duplicate names during file data
import. By combining this function with scheduled job execution, incremental filemigration can be implemented.
c. Support for date and time variable functions dataformat and timestamp, whichcan be used in table names, Where clauses, and file paths. By combining thisfunction with scheduled job execution, incremental file migration can beimplemented.
d. Support for common date formats during field mapping configuratione. Optimized error code and messages.f. Support for cluster VM restart and graceful restart modesg. Support for copying field names from the migration source if an HBase table is
created during data import to HBaseh. Support for MongoDB
l Fixed Bugs
a. Job mapping pages of HBase and Redis jobsb. Failures of batch job startup and occasional startup failures of some jobs in batch
startupc. Occasional generation of empty directories when data is migrated from OBSd. Handle leakage of MySQL and FTP links
2017.9.30 Launched for Open Beta Test1. Launched CDM, which supports table data import and export among data sources like
FTP, SFTP, HDFS, OBS, HBase, Hive, DWS, MySQL, Oracle, Db2, PostgreSQL,Microsoft SQL Server, Redis, and VoltDB.
2. Support for wizard-based configuration of import and export jobs and concurrencypolicies
3. Support for using a VM as a service unit to implement security isolation4. Support for setting row and column separators in file data export and configuring the
regular expression for filtering and encoding types5. Support for scheduled job execution6. Optimized the performance of importing data to MySQL, DWS, HBase, and Hive.
Cloud Data MigrationUser Guide A Version Updates
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 260
B Change History
Release Date What's New
2018-08-03 This is the tenth official release.l Added the following sections:
– Migrating Data from OSS to OBS– Migrating Data from OBS to Cloud Search Service– Migrating the Entire Elasticsearch Database to Cloud
Search Service– File Formats
l Updated the screenshots.l Updated the operation procedures in Typical Scenarios.l Updated the description of most job parameters in Job
Management and added multiple job parameters.
2018-07-05 This is the ninth official release.l Added the following sections:
– CTS– Link to Qiniu Cloud Object Storage– Migrating Data from the MySQL Database to DDM
l Updated the screenshots.l Updated the parameter description in the following sections:
– Data Sources Supported by CDM– Creating a Link– Link to HDFS– Link to HBase– From Elasticsearch/Cloud Search Service– To OBS– To FTP/SFTP/NAS
Cloud Data MigrationUser Guide B Change History
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 261
Release Date What's New
2018-06-02 This is the eighth official release.l Added the following sections:
– From HTTP/HTTPS– Migrating Data from the MySQL Database to the MRS
Hive Partition Table– HBase/CloudTable Incremental Migration– GDS Import Mode– What Do I Do If the Error Message "value too long for
type character varying" Is Displayed When I Migrate Datato DWS?
l Updated the screenshots.l Updated the following sections because HTTP/HTTPS can be
used as the migration source:– Data Sources Supported by CDM– Creating a Link– Table/File Migration
l Updated the following sections because the automatic shutdownand scheduled power-on/off are supported:– Purchasing CDM– Creating a Cluster– Stopping, Starting, or Deleting a Cluster
l Updated the parameter description in the following sections:– Link to Elasticsearch– From OBS/OSS– From a Relational Database– To OBS
Cloud Data MigrationUser Guide B Change History
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 262
Release Date What's New
2018-05-04 This is the seventh official release.l Added the following sections:
– Monitoring– Link to HBase– Link to Hive– To DDS– Advanced Operations– What Is the Most Economical Way to Migrate Data from
the Public Network Using CDM?– How Do I Select Distribution Columns When Using CDM
to Migrate Data to DWS?l Updated the following sections:
– Data Sources Supported by CDM– Related Services– Constraints– Purchasing CDM– Creating and Executing a Job– Creating a Cluster– Creating a Link– Table/File Migration– Entire DB Migration– From a Relational Database– Managing a Single Job
l Updated the screenshots.l Changed Elasticsearch Service (ES) to Cloud Search Service.l Changed Unlimited Query Service (UQuery) to Data Lake Insight
(DLI).l Changed Data Pipeline Service (DPS) to Data Lake Factory
(DLF).
Cloud Data MigrationUser Guide B Change History
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 263
Release Date What's New
2018-04-09 This is the sixth official release.l Added the following sections:
– Link to OSS on Alibaba Cloud– Link to DLI– To DLI– Migrating Data from OBS to DLI– What Do I Do If the System Displays a Message Indicating
that the Date Format Fails to Be Parsed When Data IsImported to Cloud Search Service?
– What Do I Do If the Map Field Tab Page Cannot DisplayAll Columns When Data Is Exported from HBase/CloudTable?
l Updated the following sections:– Data Sources Supported by CDM– Related Services– Constraints– CDM Billing– Purchasing CDM– Creating a Cluster– Stopping, Starting, or Deleting a Cluster– From OBS/OSS– From HBase/CloudTable– Field Conversion During Migration
l Updated the procedure for binding EIPs because the EIPs are notautomatically bound.
l Updated the screenshots.
Cloud Data MigrationUser Guide B Change History
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 264
Release Date What's New
2018-01-31 This is the fifth official release.l Added the following sections:
– From Elasticsearch/Cloud Search Service– Using Regular Expressions to Separate Semi-structured
Text– Migrating the Entire MySQL Database to RDS
l Updated the data sources supported in table/file migration in DataSources Supported by CDM.
l Added the JS expression example in Field Conversion DuringMigration.
l Updated job parameters, and modified Source Job Parametersand Destination Job Parameters.
l Added the description of selecting a connector in the first step inthe procedure for creating a link.
l Deleted the following sections:– From VoltDB– To VoltDB– Using CDM to Archive MySQL Data to OBS– Creating the PostgreSQL Link on RDS on HUAWEI CLOUD
2018-01-11 This is the fourth official release.l Added the following sections:
– Data Sources Supported by CDM– Link to HDFS– Link to CloudTable– Link to Kafka– Entire DB Migration– From Apache Kafka– Migrating Data from Oracle to Cloud Search Service– Version Updates
l Modified several connector parameters, job parameters, andcorresponding parameter descriptions.
l Modified "Procedure" in Creating and Executing a Job.
Cloud Data MigrationUser Guide B Change History
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 265
Release Date What's New
2017-11-30 This is the third official release.l Added the following sections:
– Binding or Unbinding an EIP– Link to a NAS Server– Link to DIS– Link to Elasticsearch– From DIS– To Elasticsearch/Cloud Search Service– Field Conversion During Migration– Typical Scenarios
l Changed all connector names by deleting connector from thenames in the document.
l Modified content in Scheduling Job Execution.
2017-10-31 This is the second official release.l Added Link to MongoDB/DDS.l Added Scheduling Job Execution.l Added Incremental Synchronization Using the Macro
Variables of Date and Time.l Modified the parameter description of the source job
configuration and destination job configuration, and enabled thedirectory, table name, and Where clause to be configured as timemacro variables.
l Modified the data source list supported by CDM, added theMongoDB data source, and added several data migrationscenarios.
2017-09-30 This is the first official release.
Cloud Data MigrationUser Guide B Change History
Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 266