User Guide · 2019-07-18 · Cloud Data Migration (CDM) enables data migration among various data...

Cloud Data Migration

User Guide

Issue 10

Date 2018-08-03

HUAWEI TECHNOLOGIES CO., LTD.

Copyright © Huawei Technologies Co., Ltd. 2018. All rights reserved.No part of this document may be reproduced or transmitted in any form or by any means without prior writtenconsent of Huawei Technologies Co., Ltd. Trademarks and Permissions

and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.All other trademarks and trade names mentioned in this document are the property of their respectiveholders. NoticeThe purchased products, services and features are stipulated by the contract made between Huawei and thecustomer. All or part of the products, services and features described in this document may not be within thepurchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information,and recommendations in this document are provided "AS IS" without warranties, guarantees orrepresentations of any kind, either express or implied.

The information in this document is subject to change without notice. Every effort has been made in thepreparation of this document to ensure accuracy of the contents, but all statements, information, andrecommendations in this document do not constitute a warranty of any kind, express or implied.

Huawei Technologies Co., Ltd.Address: Huawei Industrial Base

Bantian, LonggangShenzhen 518129People's Republic of China

Website: http://e.huawei.com

Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. i

http://e.huawei.com

Contents

1 Introduction.................................................................................................................................... 11.1 CDM............................................................................................................................................................................... 11.2 Data Sources Supported by CDM.................................................................................................................................. 11.3 Application Scenarios.....................................................................................................................................................51.4 Related Services............................................................................................................................................................. 61.5 Basic Concepts............................................................................................................................................................... 91.6 Accessing and Using CDM.......................................................................................................................................... 101.6.1 How to Access CDM.................................................................................................................................................101.6.2 How to Use CDM...................................................................................................................................................... 101.6.3 CDM Billing.............................................................................................................................................................. 101.6.4 User Permissions....................................................................................................................................................... 101.7 Constraints.................................................................................................................................................................... 11

2 Getting Started............................................................................................................................. 162.1 Overview...................................................................................................................................................................... 162.2 Purchasing CDM.......................................................................................................................................................... 162.3 Creating Links.............................................................................................................................................................. 182.4 Creating and Executing a Job....................................................................................................................................... 212.5 Querying Job Execution Results...................................................................................................................................24

3 Cluster Management...................................................................................................................253.1 Creating a Cluster......................................................................................................................................................... 253.2 Binding or Unbinding an EIP....................................................................................................................................... 283.3 Restarting a Cluster...................................................................................................................................................... 293.4 Stopping, Starting, or Deleting a Cluster......................................................................................................................303.5 Viewing Cluster Configurations, Logs, and Monitoring Data......................................................................................313.6 Monitoring.................................................................................................................................................................... 323.6.1 CDM Metrics.............................................................................................................................................................333.6.2 Configuring Alarm Rules.......................................................................................................................................... 333.6.3 Querying Metrics.......................................................................................................................................................343.7 CTS...............................................................................................................................................................................363.7.1 Key CDM Operations Recorded by CTS.................................................................................................................. 363.7.2 Viewing Traces.......................................................................................................................................................... 37

4 Link Management....................................................................................................................... 39

Cloud Data MigrationUser Guide Contents

Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. ii

4.1 Creating a Link............................................................................................................................................................. 394.2 Link Parameter Description..........................................................................................................................................424.2.1 Link to Relational Databases.....................................................................................................................................424.2.2 Link to OBS...............................................................................................................................................................464.2.3 Link to OSS on Alibaba Cloud..................................................................................................................................464.2.4 Link to Qiniu Cloud Object Storage..........................................................................................................................474.2.5 Link to HDFS............................................................................................................................................................ 474.2.6 Link to HBase............................................................................................................................................................514.2.7 Link to Hive...............................................................................................................................................................544.2.8 Link to CloudTable.................................................................................................................................................... 544.2.9 Link to an FTP or SFTP Server................................................................................................................................. 554.2.10 Link to a NAS Server.............................................................................................................................................. 554.2.11 Link to MongoDB/DDS...........................................................................................................................................564.2.12 Link to Redis/DCS...................................................................................................................................................564.2.13 Link to Kafka...........................................................................................................................................................574.2.14 Link to DIS.............................................................................................................................................................. 574.2.15 Link to Elasticsearch............................................................................................................................................... 584.2.16 Link to DLI..............................................................................................................................................................584.3 Editing/Deleting a Link................................................................................................................................................ 58

5 Job Management..........................................................................................................................605.1 Creating a Job............................................................................................................................................................... 605.1.1 Table/File Migration.................................................................................................................................................. 605.1.2 Entire DB Migration..................................................................................................................................................705.2 Source Job Parameters..................................................................................................................................................735.2.1 From OBS/OSS......................................................................................................................................................... 745.2.2 From HDFS............................................................................................................................................................... 785.2.3 From HBase/CloudTable........................................................................................................................................... 805.2.4 From Hive..................................................................................................................................................................825.2.5 From FTP/SFTP/NAS............................................................................................................................................... 825.2.6 From HTTP/HTTPS.................................................................................................................................................. 865.2.7 From a Relational Database.......................................................................................................................................875.2.8 From MongoDB/DDS............................................................................................................................................... 905.2.9 From Redis................................................................................................................................................................ 915.2.10 From DIS................................................................................................................................................................. 915.2.11 From Apache Kafka.................................................................................................................................................925.2.12 From Elasticsearch/Cloud Search Service...............................................................................................................935.3 Destination Job Parameters.......................................................................................................................................... 935.3.1 To OBS...................................................................................................................................................................... 935.3.2 To HDFS....................................................................................................................................................................975.3.3 To HBase/CloudTable................................................................................................................................................985.3.4 To Hive.................................................................................................................................................................... 1005.3.5 To FTP/SFTP/NAS..................................................................................................................................................102


Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. iii

5.3.6 To a Relational Database......................................................................................................................................... 1055.3.7 To DDS.................................................................................................................................................................... 1085.3.8 To DCS.................................................................................................................................................................... 1095.3.9 To Elasticsearch/Cloud Search Service................................................................................................................... 1095.3.10 To DLI....................................................................................................................................................................1105.4 Scheduling Job Execution...........................................................................................................................................1105.5 Managing a Single Job................................................................................................................................................1145.6 Batch Managing Jobs..................................................................................................................................................115

6 Typical Scenarios.......................................................................................................................1176.1 Migrating Data from DDS to DWS............................................................................................................................ 1176.2 Periodically Backing Up FTP/SFTP Files to HUAWEI CLOUD OBS..................................................................... 1226.3 Migrating Data from OSS to OBS..............................................................................................................................1296.4 Migrating Data from On-premises Redis to DCS...................................................................................................... 1346.5 Migrating Data from Oracle to Cloud Search Service................................................................................................1396.6 Migrating Data from OBS to Cloud Search Service.................................................................................................. 1436.7 Migrating Data from OBS to DLI.............................................................................................................................. 1486.8 Migrating Data from the MySQL Database to the MRS Hive Partition Table.......................................................... 1536.9 Migrating Data from the MySQL Database to DDM.................................................................................................1616.10 Migrating the Entire MySQL Database to RDS.......................................................................................................1716.11 Migrating the Entire Elasticsearch Database to Cloud Search Service.................................................................... 176

7 Advanced Operations............................................................................................................... 1817.1 Incremental File Migration.........................................................................................................................................1817.2 Incremental Migration of Relational Databases......................................................................................................... 1837.3 HBase/CloudTable Incremental Migration.................................................................................................................1877.4 Incremental Synchronization Using the Macro Variables of Date and Time............................................................. 1887.5 Migration in Transaction Mode.................................................................................................................................. 1927.6 Data Encryption During the Migration to OBS..........................................................................................................1937.7 MD5 Verification for Files in Migration.................................................................................................................... 1947.8 Field Conversion During Migration........................................................................................................................... 1967.9 Migration of a List of Files.........................................................................................................................................2037.10 Using Regular Expressions to Separate Semi-structured Text................................................................................. 2047.11 GDS Import Mode.................................................................................................................................................... 2097.12 File Formats.............................................................................................................................................................. 211

8 FAQs.............................................................................................................................................2208.1 What Are the Advantages of CDM?...........................................................................................................................2208.2 What Service Data Can Be Migrated by CDM?.........................................................................................................2218.3 What Security Protection Measures Are Used in CDM?........................................................................................... 2238.4 What is the Performance of Using CDM to Migrate Data?........................................................................................2238.5 What Is the Most Economical Way to Migrate Data from the Public Network Using CDM?...................................2238.6 Does CDM Support Incremental Data Migration?.....................................................................................................2248.7 Can Fields Be Converted During Data Migration?.................................................................................................... 228


Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. iv

8.8 What Data Formats Are Supported When the Data Source Is Hive?......................................................................... 2368.9 Does CDM Support Job Synchronization Between Different Clusters?.................................................................... 2368.10 Can I Create Jobs in Batches on CDM?................................................................................................................... 2378.11 Can I Back Up Jobs When the CDM Cluster Is Not Used for a Long Time?.......................................................... 2378.12 How Do I Use Java to Invoke CDM RESTful APIs to Create Data Migration Jobs?............................................. 2378.13 How Do I Connect On-premises Intranet or Third-Party Private Network to CDM?..............................................2438.14 What Do I Do If the System Displays a Message Indicating that the Date Format Fails to Be Parsed When Data IsImported to Cloud Search Service?.................................................................................................................................. 2458.15 What Do I Do If the Map Field Tab Page Cannot Display All Columns When Data Is Exported from HBase/CloudTable?......................................................................................................................................................................2488.16 How Do I Select Distribution Columns When Using CDM to Migrate Data to DWS?.......................................... 2518.17 What Do I Do If the Error Message "value too long for type character varying" Is Displayed When I Migrate Datato DWS?........................................................................................................................................................................... 252

A Version Updates....................................................................................................................... 255

B Change History..........................................................................................................................261


Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. v

1 Introduction

1.1 CDMCloud Data Migration (CDM) enables data migration among various data sources. It allowsyou to migrate data among public cloud services or between the public cloud and on-premisesservice systems.

Based on the distributed computing framework and the concurrent processing technology,CDM helps you migrate massive sets of data stably and efficiently. You can migrate dataonline and construct a desired data structure.

CDM provides the following features:

l Ease of use: You can migrate data by configuring data sources and migration jobs on thegraphical user interface (GUI), and CDM will manage and maintain the data sources andmigration tasks. In other words, you only need to focus on the data migration logicwithout worrying about the environment, which greatly reduces development andmaintenance costs.

l High efficiency: Based on the distributed computing framework, CDM jobs are split intoindependent sub-jobs and executed concurrently, which drastically improves datamigration efficiency. In addition, efficient data import application programminginterfaces (APIs) are used to import data from Hive, HBase, Data Warehouse Service(DWS), and MySQL database.

l Support for various data sources: Various data sources such as databases, Hadoop,NoSQL, data warehouses, and files are supported.

l Support for multiple network environments: CDM helps you easily cope with variousdata migration scenarios, including data migration to the cloud, data exchange on thecloud, and data migration to on-premises service systems, regardless of whether the datais stored on on-premises Internet Data Center (IDC), third-party clouds (public cloud orprivate cloud), HUAWEI CLOUD services, or self-built databases or file systems usingElastic Cloud Servers (ECSs) on HUAWEI CLOUD.

1.2 Data Sources Supported by CDMCDM supports table/file migration and entire DB migration:

Cloud Data MigrationUser Guide 1 Introduction

Issue 10 (2018-08-03) Copyright © Huawei Technologies Co., Ltd. 1

l Table/file migration: It is applicable to data migration to the cloud, data exchange on thecloud, and data migration to on-premises service systems.

l Entire DB migration: It is applicable to database migration to the cloud.

Table/File Migration

Table 1-1 describes the supported data sources.

Table 1-1 Supported data sources during table/file migration

Data SourceType

Data Source Used as aSource

Used as aDestination

Data warehouse Data Warehouse Service (DWS) Supported Supported

Data Lake Insight (DLI) Not supported Supported

FusionInsight LibrA Supported Supported

Hadoop MRS HDFS Supported Supported

MRS HBase Supported Supported

MRS Hive Supported Supported

FusionInsight HDFS Supported Supported

Apache HDFS Supported Supported

Hadoop HBase Supported Supported

FusionInsight HBase Supported Supported

Object storage Object Storage Service (OBS) Supported Supported

Alibaba Cloud Object StorageService (OSS)

Supported Not supported

Qiniu Cloud Object Storage Supported Not supported

File system FTP Supported Supported

SFTP Supported Supported

HTTP Supported Not supported

Network Attached Storage(NAS)

Supported Supported

Relational database RDS for MySQL Supported Supported

RDS for PostgreSQL Supported Supported

RDS for SQL Server Supported Supported

Distributed DatabaseMiddleware (DDM)

Supported Supported

MySQL Supported Supported



Data SourceType



PostgreSQL Supported Not supported

Microsoft SQL Server Supported Not supported

Oracle Supported Not supported

IBM Db2 Supported Not supported

Derecho (GaussDB) Supported Not supported

NoSQL Distributed Cache Service(DCS)

Not supported Supported

Document Database Service(DDS)

Supported Supported

CloudTable Service(CloudTable)

Supported Supported

Redis Supported Not supported

MongoDB Supported Not supported

Search Cloud Search Service Supported Supported

Elasticsearch Supported Supported

Message system Data Ingestion Service (DIS) Supported(migrated toCloud SearchService only)

Not supported

Apache Kafka

NOTE

In the preceding table, the non-HUAWEI CLOUD data sources, such as MySQL, can be the MySQLbuilt in the local data center, created by users on Elastic Cloud Server (ECS), or on the third-party cloud.

Entire DB MigrationEntire database migration is applicable to the scenario where an on-premises data center or adatabase created on the HUAWEI CLOUD ECS is synchronized to HUAWEI CLOUDdatabase services or big data services. It is suitable for offline database migration but notonline real-time migration. Figure 1-1 lists the data sources that support entire databasemigration using CDM.



Figure 1-1 Supported data sources in entire DB migration

Field Mapping in Automatic Table CreationCDM automatically creates tables at the destination during database migration. Figure 1-2describes the field mapping between DWS tables created by CDM and source tables. Forexample, if you use CDM to migrate the Oracle database to DWS, CDM automatically createstables on DWS and maps the NUMBER(3,0) field of the Oracle database to the SMALLINTfield of DWS.

Figure 1-2 Field mapping in automatic table creation on DWS



1.3 Application Scenarios

Migrating Local Data to the Public Cloud

Local data is stored in the IDC that you have built or rent, or on the private cloud, includingdata stored in relational databases, NoSQL databases, OLAP databases, and file systems.

In this scenario, if you want to use the computing and storage resources of the public cloud,you must migrate local data to the public cloud in advance, and ensure that the local networkcan communicate with the public cloud network.

Figure 1-3 Migrating local data to the public cloud

Migrating Data Between Public Cloud Services

In this scenario, you are allowed to exchange data between the following public cloudservices:l OBSl Relational Database Service (RDS)l MapReduce Service (MRS)l DWSl DDSl DCSl Cloud Search Servicel DISl CloudTablel DLIl DDMl Databases or file systems deployed on the ECSs

Migrating Public Cloud Data to On-Premises Environments

A local environment is a data storage system in the IDC that you have built or rent, or on theprivate cloud, including relational databases and file systems.

In this scenario, after data is processed using the computing and storage resources of thepublic cloud, the processed data can be returned to on-premises service systems, specifically



relational databases and file systems. Additionally, ensure that the local network cancommunicate with the public cloud network.

Figure 1-4 Migrating public cloud data to on-premises environments

1.4 Related Services

IAM

CDM uses Identity and Access Management (IAM) for authentication and authorization.

VPC

CDM clusters are created in the subnets of a Virtual Private Cloud (VPC). VPCs provide asecure, isolated, and logical network environment for CDM clusters.

MRS

CDM supports data import and export using MRS.

OBS

CDM supports data import and export using OBS, which also stores backup files and logs ofCDM clusters.

Cloud Eye

CDM uses Cloud Eye to monitor cluster performance metrics, delivering status information ina concise and efficient manner, as shown in Table 1-2. For more information about CloudEye, see the Cloud Eye User Guide.

Table 1-2 CDM performance metrics

Metric Description ValueRange

MonitoredObject

Bytes In Measures the network inbound rate ofthe monitored object.Unit: byte/s

≥ 0 bytes/s Cloud DataMigration




MonitoredObject

Bytes Out Measures the network outbound rateof the monitored object.Unit: byte/s


CPU Usage Measures the CPU usage of themonitored object.Unit: %

0% to 100% Cloud DataMigration

Memory Usage Measures the memory usage of themonitored object.Unit: %


CTS

CDM uses Cloud Trace Service (CTS) to record operations for later query, audit, andbacktrack operations. Table 1-3 displays the recorded CDM operations. For more informationabout CTS, see the Cloud Trace Service User Guide.

Table 1-3 CDM operations recorded by CTS

Operation Resource Type Trace Name

Creating a cluster cluster createCluster

Deleting a cluster cluster deleteCluster

Modifying clusterconfiguration

cluster modifyCluster

Starting a cluster cluster startCluster

Stopping a cluster cluster stopCluster

Restarting a cluster cluster restartCluster

Importing a job cluster clusterImportJob

Binding an EIP cluster bindEip

Unbinding an EIP cluster unbindEip

Creating a link link createLink

Modifying a link link modifyLink

Deleting a link link deleteLink

Creating a job job createJob

Modifying a job job modifyJob

Deleting a job job deleteJob




Starting a job job startJob

Stopping a job job stopJob

DWS

CDM allows you to import data to and export data from DWS.

RDS

CDM allows you to import data to and export data from RDS, including RDS for MySQL,RDS for PostgreSQL, and RDS for SQL Server.

DDS

CDM allows you to export data from DDS, but it does not allow you to import data to DDS.

DCS

CDM allows you to import data to DCS, but it does not allow you to export data from DDS.

Cloud Search Service

CDM allows you to import data to and export data from Cloud Search Service.

DIS

CDM allows you to export data from DIS to Cloud Search Service, but it does not allow youto import data to DIS.

CloudTable

CDM allows you to import data to and export data from CloudTable.

DLI

CDM allows you to import data to DLI, but it does not allow you to export data from DLI.

DDM

CDM allows you to import data to and export data from DDM.

Data Lake Factory (DLF)

CDM can be orchestrated and scheduled as a node task of DLF.



1.5 Basic Concepts

CDM ClusterA CDM cluster is a CDM instance that you have purchased. It consists of one or more VMs.You can purchase multiple CDM clusters for different purposes. For example, you canpurchase a CDM cluster for the financial department and the procurement departmentrespectively to isolate data access permissions.

Local EnvironmentA local environment is a data storage system in the IDC that you have built or rent, or on theprivate cloud, including relational databases and file systems.

Local DataLocal data is stored in the IDC that you have built or rent, or on the private cloud, includingdata stored in relational databases, NoSQL databases, OLAP databases, and file systems.

ConnectorA connector is a built-in object template used for connecting to a data source. Currently, CDMuses connectors to connect to OBS, MRS, and databases. New connectors can be added toCDM as well.

LinkA link is an object set up based on a connector and used to connect to a specific data source.

To create a link, you must specify the link name, connector, data source address, andauthentication information. For example, to connect to a MySQL database, you must set thehost IP address, port number, username, and password.

After a link is set up, it can be used by multiple jobs as either a source or a destination link.

JobA job is a data migration task that you have created to migrate data from a specific datasource to another. To create a job, you must specify a source link, destination link, and datamapping rules.

Source Job ConfigurationDuring job creation, the source link specifies the data source from which data is extracted.The job parameters of different source links vary. For example, the table or directory fromwhich data is exported is specified in the job configuration at the source end.

Destination Job ConfigurationDuring job creation, the destination link specifies the data source to which data is loaded. Thejob parameters of different destination links vary. For example, the table or directory to whichdata is imported is specified in the job configuration of the destination end.



Field MappingDuring job creation, especially jobs of migrating data between heterogeneous data sources,you must configure the mapping between the source and destination data sources, such asfield mapping and field type mapping.

1.6 Accessing and Using CDM

1.6.1 How to Access CDMCDM provides a web-based service management platform, that is, the management console.You can access CDM using HTTPS-compliant application programming interfaces (APIs) orthe management console.l Management console

After registering with the public cloud, log in to the management console to accessCDM.

l APIIf you want to integrate CDM with third-party systems for secondary development,access CDM using APIs. For details, see the Cloud Data Migration API Reference.

1.6.2 How to Use CDMThe procedure of applying for and using CDM is as follows:

1. Apply for CDM.To apply for CDM is to build a CDM cluster. For details about how to create a CDMcluster, see Creating a Link.

2. Create links.A source link and a destination link are required for a data migration task. Select aproper connector according to the data source type. For details, see Creating a Link.

3. Create and execute jobs.Select the source and destination links and configure job and task parameters accordingto the types of source and destination data sources. For details, see Creating a Job.

4. Query job execution results.After a job is executed, you can query its execution logs, data statistics, and historicalexecution status. For details about how to query the historical job information, seeManaging a Single Job.

1.6.3 CDM BillingCDM adopts the pay-per-use billing mode on an hourly basis, which means that you arecharged on the hour. This mode is flexible so that you can start or stop the CDM cluster asyou like. For details about the billing items, see the Cloud Data Migration Price Description.

1.6.4 User PermissionsCDM uses IAM to isolate links and jobs created by multiple accounts in a CDM cluster.

Currently, CDM does not support user group permission. In other words, users cannot beassigned to the same user group to share information about their links and jobs.



https://console.huaweicloud.com/cdm?locale=en-us

https://support.huaweicloud.com/en-us/api-cdm/en-us_topic_cdm_api.html

https://support.huaweicloud.com/en-us/usermanual-cdm/cdm_01_0023.html

https://support.huaweicloud.com/en-us/price-cdm/cdm_03_0002.html

1.7 ConstraintsDue to various factors such as technology and cost, CDM has constraints on data migration.

CDM System Constraints1. Currently, CN North-Beijing1, CN East-Shanghai2, and CN South-Guangzhou are

supported.

2. You cannot modify the flavor of an existing cluster. If you require a higher flavor, createa cluster.

3. CDM does not support the function of controlling the data migration speed. Therefore,do not perform data migration during peak hours.

4. Currently, the network bandwidth of all CDM instances is 1 Gbit/s. Theoretically, themaximum volume of data transmission per instance per day is 10 TB. If you havespecific requirement on the transmission speed, use multiple CDM instances.

The preceding data volume is the theoretical value. The actual data volume is restrictedby the data source type, read and write performance of the source and destination datasources, and bandwidth. The actual data volume can reach about 8 TB per day (large filemigration to OBS). It is recommended that you test the speed with a small amount ofdata before migration.

5. CDM supports incremental file migration (by skipping repeated files), but does notsupport resumable transfer.

For example, if three files are to be migrated and the second file fails to be migrated dueto the network fault. When the migration task is started again, the first file is skipped.The second file, however, cannot be migrated from the point where the fault occurs, butcan only be migrated again.

6. During file migration, a single task supports a maximum of 100,000 files. If there are toomany files in the directory to be migrated, you are advised to split the files into differentdirectories and create multiple tasks.

7. The number of tasks executed by a single CDM instance at a time is 30 (cdm.large), 20(cdm.medium), or 10 (cdm.small). The number of queued jobs (in the pending state) tobe executed is 10,000, 5,000, and 2,000 respectively.

In database migration, a job is equivalent to migrating a table. In file migration, multiplefiles can be migrated in a job.

8. During the export of custom links and jobs, CDM does not export the access password ofthe corresponding data source. Before importing the job configuration to CDM, you needto manually edit the JSON file to supplement the password.

9. The cluster cannot automatically upgrades to a new version. You need to use the importand export functions to upgrade the cluster to the new version.

10. CDM does not automatically back up user job configurations. You need to export andback up configuration data using the export function.

11. If VPC peering connection is configured, the peer VPC subnet may overlap with theCDM management network. As a result, data sources in the peer VPC cannot beaccessed. You are advised to use the public network for cross-VPC data migration, orcontact the customer service personnel to add specific routes to the VPC peeringconnection in the CDM background.



General Constraints on Database Migration1. CDM is mainly used for batch migration. It supports only limited incremental migration

but does not support real-time incremental migration. You are advised to use DataReplication Service (DRS) to migrate the incremental data of the database to RDS.

2. The entire DB migration of CDM supports only data table migration but does notsupport migration of database objects such as stored procedures, triggers, functions, andviews. Views are migrated as tables.CDM applies only to scenarios where databases are migrated to HUAWEI CLOUD at atime, including homogeneous and heterogeneous database migrations. CDM is notapplicable to data synchronizations, such as disaster recovery and real-timesynchronization.

3. When CDM fails to migrate the entire database or data table, the data that has beenimported to the target table will not be rolled back automatically. If you want to performmigration in transaction mode, configure the Import to Staging Table parameter to rollback data when migration fails.In extreme cases, the created stage table or temporary table cannot be automaticallydeleted. You need to manually clear the table (the table name of the stage table ends with_cdm_stage). For example, cdmtet_cdm_stage).

4. If CDM needs to access data sources in the local data center (for example, the on-premises MySQL database), the data sources must support Internet access and the CDMinstances must be bound with elastic IP addresses. In this case, the best security practiceis to configure the firewall or security policies to allow only the EIPs of the CDMinstances to access the local data sources.

5. Only common data types are supported, including character strings, digits, and dates.Object types are limited. If objects are too large, migration cannot be performed.

6. Only the GBK and UTF-8 character sets are supported.

Constraints on MRS Data SourcesEach CDM cluster supports data import and export of only one MRS data source. To importand export data of different MRS data sources, create multiple CDM clusters.

Constraints on FusionInsight HD and Apache Hadoop Data SourcesIf the FusionInsight HD and Apache Hadoop data sources are deployed in the local datacenter, CDM must access all nodes in the cluster for reading and writing the Hadoop files.Therefore, the network access must be enabled for each node.

You are advised to use Direct Connect to improve the migration speed while ensuringnetwork access.

Constraints on DWS and FusionInsight LibrA Data Sources1. If the DWS primary key or table contains only one field, the field type must be a

common character string, value, or date. When data is migrated from another database toDWS, if automatic table creation is selected, the primary key must be of the followingtypes. If no primary key is set, at least one of the following fields must be set. Otherwise,the table cannot be created and the CDM job fails.– INTEGER TYPES: TINYINT, SMALLINT, INT, BIGINT, NUMERIC/DECIMAL– CHARACTER TYPES: CHAR, BPCHAR, VARCHAR, VARCHAR2,

NVARCHAR2, TEXT



https://www.huaweicloud.com/en-us/product/dc.html

– DATA/TIME TYPES: DATE, TIME, TIMETZ, TIMESTAMP, TIMESTAMPTZ,INTERVAL, SMALLDATETIME

2. In DWS, the character string '' is null. A null character string cannot be inserted into afield with non-null constraints. This is inconsistent with the MySQL behavior. MySQLdoes not consider that '' is null. Migration from MySQL to DWS may fail due to thepreceding reason.

3. When the Gauss Data Service (GDS) mode is used to quickly import data to DWS, youneed to configure a security group or firewall policy to allow DataNodes of DWS orFusionInsight LibrA to access port 25000 of the CDM IP address.

4. When data is imported to DWS in GDS mode, CDM automatically creates a foreigntable for data import. The table name ends with the universally unique identifier (UUID)(for example, cdmtest_aecf3f8n0z73dsl72d0d1dk4lcir8cd). If a job fails, it will beautomatically deleted. In extreme cases, you may need to manually delete it.

Constraints on OBS Data Sources1. During file migration, the system automatically transfers the files concurrently. In this

case, Concurrent Extractors in the task configuration is invalid.

2. Resumable transfer is not supported. If CDM fails to transfer files, OBS fragments aregenerated. You need to clear fragments on the OBS console to prevent space occupation.

3. CDM does not support the versioning control function of OBS.

4. During incremental migration, the number of files or objects in the source directory of asingle job depends on the CDM cluster flavor. A cdm.large cluster supports a maximumof 300,000 files; a cdm.medium cluster supports a maximum of 200,000 files; and acdm.small cluster supports a maximum of 100,000 files.

If the number of files or objects in a single directory exceeds the upper limit, split thefiles or objects into multiple migration jobs based on subdirectories.

5. The key for encryption for data migrated to OBS is created in Key ManagementService (KMS). This function is available only in CN North-Beijing1.

Constraints on Oracle Data Sources

Real-time incremental data synchronization is not supported for Oracle databases.

Constraints on DCS and Redis Data Sources1. Because DCS restricts the commands for obtaining keys, it cannot serve as the migration

source but can be the migration destination. The Redis service of the third-party cloudcannot serve as the migration source. However, the Redis set up in the on-premises datacenter or on the ECS can be the migration source and destination.

2. Only the hash and string data formats are supported.

Constraints on DDS and MongoDB Data Sources

When you migrate data from MongoDB to a relational database, CDM reads the first row ofthe collection as an example of the field list. If the first row of data does not contain all fieldsof the collection, you need to manually add fields.



Constraints on Cloud Search Service and Elasticsearch Data Sources1. CDM supports automatic creation of indexes and field types. The index and field type

names can contain only lowercase letters.2. You cannot modify the field type under an index after it is created, but only create

another field.If you need to modify the field type, you need to create an index or run the Elasticsearchcommand on Kibana to delete the existing index and create another index (the data isalso deleted).

3. When the field type of the index created by CDM is date, the data format must be yyyy-MM-dd HH:mm:ss.SSS Z. For example, 2018-08-08 08:08:08.888 +08:00.During data migration to Cloud Search Service, if the original data of the date field doesnot meet the format requirements, you can use the expression conversion function ofCDM to convert the data to the preceding format.

Constraints on DIS and Kafka Data Sources1. The data in the message body is a record in CSV format that supports multiple

delimiters. Messages cannot be parsed in binary or other formats.2. If a job is set to run for a long time, the job will fail if the DIS system is interrupted.

Constraints on CloudTable and HBase Data Sources1. When you migrate data from CloudTable or HBase, CDM reads the first row of the table

as an example of the field list. If the first row of data does not contain all fields of thetable, you need to manually add fields.

2. Because HBase is schema-less, CDM cannot obtain the data types. If the data is stored inbinary format, CDM cannot parse the data.

Constraints on Hive Data SourcesWhen Hive serves as the migration destination, if the storage format is TEXTFILE, delimitersmust be explicitly specified in the statement for creating Hive tables. The following gives anexample.

CREATE TABLE csv_tbl(smallint_value smallint,tinyint_value tinyint,int_value int,bigint_value bigint,float_value float,double_value double,decimal_value decimal(9, 7),timestmamp_value timestamp,date_value date,varchar_value varchar(100),string_value string,char_value char(20),boolean_value boolean,binary_value binary,varchar_null varchar(100),string_null string,char_null char(20),int_null int)ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'WITH SERDEPROPERTIES ("separatorChar" = "\t",



"quoteChar" = "'","escapeChar" = "\\")STORED AS TEXTFILE;

Constraints on Incremental Data Migration in MySQL Binlog Model Currently, this mode can be used to migrate MySQL to DWS only.l In the migration from MySQL to DWS, the constraints on the incremental data migration

function in MySQL Binlog mode are as follows:

a. A single cluster supports only one incremental migration job in MySQL Binlogmode in the current version.

b. In the current version, you are not allowed to delete or update 10,000 data records ata time.

c. Entire database migration is not supported.d. DDL Data Definition Language (DDL) operations are not supported.e. Event migration is not supported.f. If you set Migrate Incremental Data to Yes, binlog_format in the source MySQL

database must be set to ROW.g. If you set Migrate Incremental Data to Yes and binlog file ID disorder occurs on

the source MySQL instance due to cross-machine migration or rebuilding duringincremental data migration, incremental data may be lost.

h. If a primary key exists in the destination table and incremental data is generatedduring the restart of the CDM cluster or full migration, duplicate data may exist inthe primary key. As a result, the migration fails.

i. If the destination DWS database is restarted, the migration will fail. In this case,restart the CDM cluster and the migration job.

The recommended MySQL configuration is as follows:# Enable the bin-log function.log-bin=mysql-bin# ROW modebinlog-format=ROW# gtid mode. The recommended version is 5.6.10 or later.gtid-mode=ONenforce_gtid_consistency = ON



2 Getting Started

2.1 OverviewThis section describes how to use CDM to migrate the tables in the on-premises MySQLdatabase to DWS, thereby helping you get familiar with CDM. Figure 2-1 shows the specificscenario.

Figure 2-1 Migrating data from a local MySQL database to DWS

The procedure of using CDM is as follows:

1. Purchasing CDM

2. Creating Links

3. Creating and Executing a Job

4. Querying Job Execution Results

2.2 Purchasing CDM

Scenario

This section describes how to purchase CDM, that is, create a CDM cluster, to perform datamigration between an on-premises MySQL database and DWS.

Cloud Data MigrationUser Guide 2 Getting Started


Prerequisitesl Your on-premises MySQL database can be accessed using the public IP address.

l You have created a VPC.

Procedure

Step 1 Log in to the CDM management console.

Step 2 Click Buy CDM. The page for creating a CDM cluster is displayed. The following is a clusterconfiguration example:

l Current Region: Actual working area of a cluster. Currently, CN North-Beijing1, CNEast-Shanghai2, and CN South-Guangzhou are supported

l AZ: Different AZs are physically isolated but interconnected through the internalnetwork. In this example, select AZ2.

l Cluster Name: The cluster name must start with a letter and contains 4 to 64 charactersconsisting of letters, digits, hyphens (-), and underscores (_). It cannot contain specialcharacters. For example, cdm-aff1.

l Version: Retain the default value.

l Instance Type: Select an instance flavor as required, for example, select cdm.medium,which can be used in most migration scenarios.

– cdm.small: 2 vCPUs with 4 GB memory, applicable to Proof of Concept (PoC)verification and development tests

– cdm.medium: 4 vCPUs with 8 GB memory, applicable to migration of a singledatabase table with fewer than 10 million pieces of data

– cdm.large: 8 vCPUs with 16 GB memory, applicable to migration of a singledatabase table with 10 million pieces of data or more

– cdm.xlarge: 16 vCPUs with 32 GB memory, applicable to TB-level data migrationrequiring 10GE high-speed bandwidth

l VPC: Select the VPC where DWS resides.

l Subnet: You are advised to use the same subnet as that of DWS.

l Security Group: You are advised to use the security group as that of DWS.

You can select a subnet and security group that are different from those of DWS. In thiscase, configure the security group rules to allow the CDM cluster to properly accessDWS.

l Retain the default values of other parameters.

Step 3 Check the current configuration and click Buy Now to go to the page for confirming theorder.

NOTE

You cannot modify the flavor of an existing cluster. If you require a higher flavor, create a cluster.

Step 4 Click Submit. The system starts to create a CDM cluster. You can view the creation progresson the Cluster Management page.

----End



https://console.huaweicloud.com/cdm/?locale=en-us

2.3 Creating Links

Description

Before migrating the local MySQL database to DWS, create two links:

1. MySQL link: used to connect to the on-premises MySQL database.2. DWS link: used to connect to the DWS database.

CDM needs to access the on-premises data source. Therefore, before creating a link, bind anEIP to the CDM cluster.

Prerequisitesl You have sufficient EIP quota. If the quota is insufficient, apply for a higher quota. For

details about how to apply for EIPs, see the Virtual Private Cloud User Guide.l You have obtained the IP address, port number, database name, username, and password

for connecting to the MySQL database. In addition, the user must have the read, write,and delete permissions on the MySQL database.

l You have purchased the DWS instance and obtained the IP address, port number,database name, username, and password for connecting to DWS. Additionally, the usermust have the read, write, and delete permissions on the DWS database.

Creating a MySQL Link


Step 2 In the left navigation pane, click Cluster Management. Locate the cdm-aff1 cluster createdin Purchasing CDM.

Step 3 In the Operation column, click Bind Elastic IP, and select and bind an EIP to the cluster.

Step 4 Click Job Management in the Operation column of the CDM cluster. On the page that isdisplayed, choose Link Management > Create Link. The page for selecting a connector isdisplayed. See Figure 2-2.

Figure 2-2 Selecting a connector




Step 5 Select MySQL and click Next. On the page that is displayed, configure MySQL linkparameters, as shown in Figure 2-3.

Figure 2-3 Creating a MySQL link

Click Show Advanced Attributes to display optional parameters. For details, see Link toRelational Databases. Retain the default values of the optional parameters and configure themandatory parameters according to Table 2-1.

Table 2-1 MySQL link parameters

Parameter Description Example Value

Name Unique link name mysqllink




Database Server IP address or domain name of theMySQL database server

192.168.0.1

Port MySQL database port 3306

Database Name Name of the MySQL database sqoop

Username User who has the read, write, and deletepermissions on the MySQL database

admin

Password Password of the user -

Step 6 Click Save. The Link Management page is displayed.

NOTE

If an error occurs during the saving, the security settings of the MySQL database are incorrect. In thiscase, you need to enable the EIP of the CDM cluster to access the MySQL database.

----End

Creating a DWS Link

Step 1 On the Link Management tab page, click Create Link and select Data Warehouse Serviceto create a DWS link.

Step 2 Click Next. The page for configuring the DWS link parameters is displayed. Configure themandatory parameters according to Table 2-2 and retain the default values of the optionalparameters.

Table 2-2 DWS link parameters


Name Unique link name dwslink

Database Server IP address or domain name of the DWSdatabase server

192.168.0.3

Port DWS database port 8000

Database Name Name of the DWS database db_demo

Username User who has the read, write, and deletepermissions on the DWS database

dbadmin





Import Mode When creating a DWS link, select thedata import mode.l Copy: Migrate the source data to the

DWS management node and thencopy the data to DataNodes. Toaccess DWS through the Internet,select Copy.

l GDS: DataNodes of DWSconcurrently request data from theGDS component of CDM and thenwrite data to DWS. The GDS modecannot be used for data export fromDWS.

Theoretically, the GDS mode is moreefficient than the Copy mode. However,when the GDS mode is used, thefollowing configurations are required:1. Configure DWS to allow users of the

DWS link to create and deleteforeign tables.

2. Configure the security group wherethe CDM cluster resides to allow theDWS DataNodes to access port25000 of the internal IP address ofthe CDM cluster.

Copy

Step 3 Click Save. The link is successfully created.

----End

2.4 Creating and Executing a Job

Scenario

This section describes how to create a table migration job to migrate data tables from an on-premises MySQL database to DWS.

Procedure

Step 1 On the Cluster Management page, locate the cdm-aff1 cluster created in Purchasing CDM.

Step 2 Click Jobs Management in the Operation column of the CDM cluster.

Step 3 Choose Table/File Migration > Create Job, and configure the required job information. SeeFigure 2-4.



Figure 2-4 Creating a job

l Job Name: Enter a unique job name, for example, mysql2dws.l Source Job Configuration

– Source Link Name: Select the mysqllink link created in Creating Links.– Schema/Tablespace: Select the MySQL database from which data is to be

exported.– Table Name: Select the table from which data is to be exported.– Retain the default values of the optional parameters in Show Advanced Attributes.

For details, see From a Relational Database.l Destination Job Configuration

– Destination Link Name: Select the dwslink link created in Creating Links.– Schema/Tablespace: Select the database to which data is to be imported.– Auto Table Creation: Select Auto creation. If the table specified by Table Name

does not exist, CDM automatically creates the table in the DWS database.– Table Name: Select the table to which data is to be imported.– Retain the default values of other optional parameters. For details, see To a

Relational Database.

Step 4 Click Next. The Map Field page is displayed. See Figure 2-5. CDM automatically mapstable fields at the migration source and destination. Check whether the field mapping iscorrect.l If the field mapping is incorrect, click the row where the field is located and drag the

field to adjust the mapping.l You need to manually select the distribution columns of DWS. You are advised to select

the distribution columns according to the following principles:

a. Use the primary key as the distribution column.



b. If multiple data segments are combined as primary keys, specify all primary keys asthe distribution column.

c. In the scenario where no primary key is available, if no distribution column isselected, DWS uses the first column as the distribution column by default. As aresult, data skew risks exist.

l If you need to convert the content of the source fields, perform the operations describedin Field Conversion During Migration. In this example, the field conversion is notrequired.

Figure 2-5 Field mapping

Step 5 Click Next to set task parameters. Generally, retain the default values of all parameters.

In this step, you can configure the following optional functions:

l Retry upon Failure: If the job fails to be executed, you can determine whether toautomatically retry. Retain the default value Never.

l Schedule Execution: To configure scheduled jobs, see Scheduling Job Execution.Retain the default value No.

l Concurrent Extractors: Enter the number of extractors to be concurrently executed.Retain the default value 1.

l Write Dirty Data: Specify this parameter if data that fails to be processed or filtered outduring job execution needs to be written to OBS for future viewing. Before writing dirtydata, create an OBS link. Retain the default value No so that dirty data is not recorded.

l Delete Job After Completion: Retain the default value Do not delete.

Step 6 Click Save and Run. CDM starts to execute the job immediately.

NOTE

If the job fails to be executed, the following error message is displayed: SQL statements cannot beexecuted. ERROR: value too long for type character varying (7) Where: COPY dws_city, line 1, columnname: 'Chinese characters',

Cause: The length of the character field in the DWS table is insufficient. The encoding methods forChinese characters stored in MySQL and DWS are different, and the required lengths are different aswell. A Chinese character may occupy three bytes in UTF-8 encoding.

Solution: When creating a job in Step 3, enable automatic table creation. Set the Extend Field Lengthadvanced attribute to Yes, and then execute the job again. In this way, when CDM automatically createsa table in DWS, the length of the character fields is set to three times that of the original table.

----End



2.5 Querying Job Execution Results

ScenarioThis section describes how to view the job's execution results and the historical information inthe latest 90 days, including the number of written rows, read rows, written bytes, writtenfiles, and log information.

Procedure

Step 1 On the Cluster Management page, locate the cdm-aff1 cluster created in Purchasing CDM.

Step 2 Click Jobs Management in the Operation column of the CDM cluster.

Step 3 Locate the mysql_dws job created in Creating and Executing a Job and view the runningstatus of the job.

Step 4 Click Historical Record in the Operation column of the job. See Figure 2-6.

On the page that is displayed, you can view the number of written rows, read rows, writtenbytes, and written files.

Figure 2-6 Viewing historical records

Step 5 Click Log to view the job execution logs. See Figure 2-7.

Figure 2-7 Viewing job logs

----End



3 Cluster Management

3.1 Creating a Cluster

Scenario

Currently, CDM uses an independent cluster to provide secure and reliable data migrationservices. Clusters are isolated from each other and cannot be accessed mutually. A CDMcluster is created when you purchase CDM.

Currently, one cluster supports only one server and automatic capacity expansion is inplanning.

The network bandwidth for CDM clusters of all flavors is 1 Gbit/s. Currently, a server canmigrate 1 TB to 8 TB data every day. If a larger amount of data needs to be migrated or themigration speed needs to be accelerated, you can create multiple CDM clusters and multiplemigration jobs.

Prerequisitesl You have sufficient EIP quota if the data source is a local one. For details about how to

apply for EIPs, see the Virtual Private Cloud User Guide. The CDM cluster uses thepublic IP address to access the local data source.

l You have applied for a VPC, subnet, and security group. If the CDM cluster tries toconnect to another cloud service, ensure that the cluster and the cloud service are in thesame VPC. Otherwise, EIPs are required.

NOTE

If VPC peering connection is configured, the peer VPC subnet may overlap with the CDMmanagement network. As a result, data sources in the peer VPC cannot be accessed. You areadvised to use the public network for cross-VPC data migration, or contact the customer servicepersonnel to add specific routes to the VPC peering connection in the CDM background.

Procedure


Step 2 Click Buy CDM. The page for creating a CDM cluster is displayed. See Figure 3-1.

Cloud Data MigrationUser Guide 3 Cluster Management



Figure 3-1 Creating a cluster

Step 3 Create a CDM cluster. Table 3-1 describes the required parameters.

Table 3-1 Parameter description

Parameter Example Value Description

CurrentRegion

CN North-Beijng1

Actual working area of a cluster. Currently, CNNorth-Beijing1, CN East-Shanghai2, and CN South-Guangzhou are supported.

AZ AZ1 Physical region where resources use independentpower supply and networks. Different AZs arephysically isolated but interconnected through theinternal network.

Cluster Name cdm-aff1 Custom CDM cluster name

Version 1.5.0 CDM version. Retain the default value.




Instance Type cdm.medium Currently, the following flavors are available:l cdm.small: 2 vCPUs with 4 GB memory,

applicable to Proof of Concept (PoC) verificationand development tests

l cdm.medium: 4 vCPUs with 8 GB memory,applicable to migration of a single database tablewith fewer than 10 million pieces of data

l cdm.large: 8 vCPUs with 16 GB memory,applicable to migration of a single database tablewith 10 million pieces of data or more

l cdm.xlarge: 16 vCPUs with 32 GB memory,applicable to TB-level data migration requiring10GE high-speed bandwidth

VPC vpc1 VPC, subnet, and security group where the CDMcluster resides, which are used to communicate withthe desired data source. They can be selectedaccording to residing networks of the migrationsource and destination.l If the CDM cluster and the data source to be

connected belong to different VPCs or the datasource is an on-premises one, the CDM clusterneeds to be bound with an elastic IP address(EIP).

l If the data source is a cloud service, you areadvised to configure the network of the CDMcluster to be the same as that of the cloud serviceand the CDM cluster does not need to be boundwith an EIP.

l If the data source is a cloud service, and CDM andthe cloud service are in the same VPC but indifferent subnets, configure security group rules tointerconnect the CDM cluster with the cloudservice.

For more information, see the Virtual Private CloudUser Guide.

Subnet subnet-1

SecurityGroup

sg-1

AutoShutdown

No After Auto Shutdown is enabled, if no job is runningin the cluster and no scheduled job is created, acluster will automatically shut down 15 minutes laterto reduce costs.After a cluster is created, if you want to modifyautomatic shutdown or scheduled startup andshutdown, click the cluster name in the cluster listand click the Cluster Configuration tab to modifythe configuration.




ScheduledStartup

No The CDM cluster supports scheduled startup. If thisparameter is enabled, set the scheduled startup timeevery day.

ScheduledShutdown

No During scheduled shutdown, the system does not waitfor the completion of unfinished jobs.

Step 4 Check the current configuration and click Buy Now to go to the page for confirming theorder.

NOTE

You cannot modify the flavor of an existing cluster. If you require a higher flavor, create a cluster.

Step 5 Click Submit. The system starts to create a CDM cluster. You can view the creation progresson the Cluster Management page.

----End

3.2 Binding or Unbinding an EIP

ScenarioBind an EIP to or unbind an EIP from a CDM cluster. If CDM needs to access the local orInternet data source, bind an EIP to the CDM cluster or use the NAT gateway to enable theCDM cluster to share the EIP with ECSs to access the Internet. For details, see What Is theMost Economical Way to Migrate Data from the Public Network Using CDM.

The EIPs you use are billed by the VPC service. The default EIP bandwidth is 5 Mbit/s. Toadjust the EIP bandwidth, log in to the VPC console, select Elastic IP. In the Operationcolumn, choose More > Modify Bandwidth.

Prerequisitesl You have created a CDM cluster.l You have sufficient EIP quota. If the quota is insufficient, apply for a higher quota. For

details about how to apply for EIPs, see the Virtual Private Cloud User Guide.

Procedure


Step 2 In the left navigation pane, click Cluster Management. The Cluster Management page isdisplayed.l Binding an EIP: In the Operation column, click Bind Elastic IP, as shown in Figure

3-2. The Bind Elastic IP dialog box is displayed.




Figure 3-2 Binding an EIP

l Unbinding an EIP: In the Operation column, choose More > Unbind Elastic IP.

Step 3 Click OK.

----End

3.3 Restarting a Cluster

ScenarioIf a service exception occurs, restart the service process or the VMs in the cluster.

PrerequisitesThe target cluster is running properly and no services will be interrupted if the cluster isrestarted.

Procedure


Step 2 In the left navigation pane, click Cluster Management. The Cluster Management page isdisplayed.

Step 3 In the row of the target cluster, click Restart

Step 4 Select the restart method, as shown in Figure 3-3.l Graceful: Only the CDM service process is restarted. The cluster VM will not be

restarted.l Restart cluster VM: The service process will be interrupted and VMs in the cluster will

be restarted.




Figure 3-3 Restarting a cluster

Step 5 Click OK.

----End

3.4 Stopping, Starting, or Deleting a Cluster

Scenario

When creating a CDM cluster, you can set the automatic shutdown or scheduled startup andshutdown function for the cluster. After the cluster is created, click the name of a cluster onthe Cluster Management page and click the Cluster Configuration tab to modify automaticshutdown or scheduled startup and shutdown.

You can also manually shut down or delete clusters to reduce costs.

NOTE

Before the deletion, you can use the batch export function by referring to Batch Managing Jobs to saveall job JSON files to a local PC. Then, you can create a cluster and import the jobs again whennecessary.



Prerequisites

The target cluster is running properly and no services will be interrupted if the cluster isdeleted.

Procedure


Step 2 In the left navigation pane, click Cluster Management. The Cluster Management page isdisplayed.

Step 3 In the Operation column, click More and select Start, Delete, or Stop to start, delete, or stopa cluster.

Step 4 Click the name of a cluster and click the Cluster Configuration tab to modify automaticshutdown or scheduled startup and shutdown.

Figure 3-4 Modifying cluster configuration

Step 5 Click Save.

----End

3.5 Viewing Cluster Configurations, Logs, and MonitoringData

Scenario

View cluster configurations, obtain cluster logs, and view monitoring data on Cloud Eye.

Prerequisites

You have created a CDM cluster.

Procedure






Step 2 In the left navigation pane, click Cluster Management to display the cluster list. See Figure3-5.

Figure 3-5 Cluster list

Step 3 Click in front of the cluster name to view the configurations of the cluster, including thecluster flavor, creation time, node quantity, node configurations, network configurations,project ID, cluster ID, and instance ID.

Figure 3-6 Viewing cluster configurations

Step 4 In the row of the cluster, choose More > Download Log to obtain cluster logs.

Step 5 In the row of the cluster, choose More > View Monitoring Data. The Cloud Eyemanagement console is displayed, on which you can view the inbound and outbound rates,and CPU and memory usages. For details about the monitoring metrics, see Monitoring.

----End

3.6 MonitoringMonitoring is the key to ensure CDM cluster performance, reliability, and availability. Usingmonitored data, you can determine CDM cluster resource usage. Cloud Eye on HUAWEICLOUD helps you better understand the running status of your CDM clusters. You can useCloud Eye to automatically monitor CDM clusters in real time and manage alarms andnotifications, so that you can keep track of CDM cluster performance metrics.

This section describes the following:

l CDM Metrics

l Configuring Alarm Rules

l Viewing CDM Metrics



3.6.1 CDM MetricsTable 3-2 lists the CDM metrics.

Table 3-2 CDM performance metrics


MonitoredObject

Bytes In Measures the network inbound rate ofthe monitored object.Unit: byte/s


Bytes Out Measures the network outbound rateof the monitored object.Unit: byte/s


CPU Usage Measures the CPU usage of themonitored object.Unit: %


Memory Usage Measures the memory usage of themonitored object.Unit: %


3.6.2 Configuring Alarm Rules

Scenario

Set the alarm rules to customize the monitored objects and notification policies. Then, learnCDM running status in a timely manner.

A CDM alarm rule includes the alarm rule name, monitored object, metric, threshold,monitoring interval, and whether to send a notification. This section describes how to setCDM alarm rules.

Procedure1. Log in to the CDM management console.2. Choose Cluster Management. Choose More > View Monitoring Data. The Cloud Eye

management console is displayed.3. In the left navigation pane, choose Alarm Management > Alarm Rules.4. On the Alarm Rules page, click Create Alarm Rule to create an alarm rule, or modify

an existing alarm rule.The following operations use the modification of an existing alarm rule as an example.

a. Click the name of the target alarm rule.b. Click Modify in the upper right corner of the page.c. On the Modify Alarm Rule page shown in Figure 3-7, set parameters as prompted.




Figure 3-7 Modifying an alarm rule

d. Click OK. After the alarm rule is set, the system automatically notifies you when analarm is triggered.

NOTE

For more information about CDM alarm rules, see the Cloud Eye User Guide.

3.6.3 Querying Metrics

Scenario

Cloud Eye on HUAWEI CLOUD monitors CDM cluster running statuses. You can obtain themonitoring metrics of CDM on the Cloud Eye management console.

Monitored data requires a period of time for transmission and display. The status of CDMdisplayed on the Cloud Eye page is the status obtained 5 to 10 minutes before. You can viewthe monitored data of a newly created CDM cluster 5 to 10 minutes later.



Prerequisitesl The CDM cluster is running properly.

If a cluster fails to shut down or restart, or is unavailable, its monitoring metrics cannotbe viewed on Cloud Eye. You can view the monitored data only after the cluster isrestarted or recovered.

l Alarm rules have been configured on the Cloud Eye page. For details, see ConfiguringAlarm Rules.

l The cluster has been properly running for about 10 minutes.The monitored data and graphs are available for a newly created cluster after the clusterruns for at least 10 minutes.

Procedure1. Log in to the CDM management console.2. Choose Cluster Management. Choose More > View Monitoring Data. The Cloud Eye

management console is displayed.3. On the CDM monitoring page, you can view the graphs of all monitoring metrics, as

shown in Figure 3-8.

Figure 3-8 Viewing metrics




4. Click in the upper right corner of the graphs to zoom out the graphs, as shown inFigure 3-9.The system allows you to select a fixed time range or customize the time range.

a. Fixed time ranges include Last 1 hour, Last 3 hours, Last 12 hours, Last 24hours, Last 7 days, and Last 30 days.

b. A customized time range can be specified within the latest seven days.

Figure 3-9 Zoomed out monitoring graph

3.7 CTS

3.7.1 Key CDM Operations Recorded by CTS

Scenario

CTS provides records of operations on cloud service resources. With CTS, you can query,audit, and backtrack these operations.

Prerequisites

CTS has been enabled.

Key CDM operations recorded by CTS

Table 3-3 CDM operations recorded by CTS


Creating a cluster cluster createCluster




Deleting a cluster cluster deleteCluster

Modifying clusterconfiguration

cluster modifyCluster

Starting a cluster cluster startCluster

Stopping a cluster cluster stopCluster

Restarting a cluster cluster restartCluster

Importing a job cluster clusterImportJob

Binding an EIP cluster bindEip

Unbinding an EIP cluster unbindEip

Creating a link link createLink

Modifying a link link modifyLink

Deleting a link link deleteLink

Creating a job job createJob

Modifying a job job modifyJob

Deleting a job job deleteJob

Starting a job job startJob

Stopping a job job stopJob

3.7.2 Viewing Traces

Scenario

After you enable CTS, the system starts to record the CDM operations. The managementconsole of CTS stores the traces of the latest seven days.

This section describes how to query these traces.

Procedure1. Log in to the management console.

2. Click in the upper left corner and select the desired region and project.

3. Click Service List, and choose Management & Deployment > Cloud Trace Service.

4. In the left navigation pane, click Trace List.

5. Click Filter and specify filters as required. Figure 3-10 shows the recorded CDM traces.



Figure 3-10 CDM traces

6. Locate a trace and click to unfold the trace details.7. Locate a trace and click View Trace in the Operation column.

For more information about CTS, see the Cloud Trace Service User Guide.



4 Link Management

4.1 Creating a Link

ScenarioBefore creating a data migration task, create a link to enable the CDM cluster to read datafrom and write data to the data source. The same link can be used as the link for CDM toexport data (source link) or import data (destination link).

The link configurations vary with the data source type. This section describes how to create alink based on the data source type.

Prerequisitesl You have created a CDM cluster.l The CDM cluster can communicate with the data source. To connect the internal network

to the HUAWEI CLOUD network, see How Do I Connect On-Premises Intranet orThird-Party Private Network to CDM?.

l You have obtained the URL and the account for accessing the data source. The accountis granted with the read and write permission on the data source.

Procedure


Step 2 In the left navigation pane, click Cluster Management. Locate the target cluster, choose JobManagement > Link Management > Create Link, and select a connector. See Figure 4-1.

NOTE

The connectors are classified based on the type of the data source to be connected. All types of datasources that support data import or export using CDM are displayed.

Cloud Data MigrationUser Guide 4 Link Management




Step 3 Select a data source and click Next.

On the page that is displayed, configure the required parameters based on Table 4-1.

Table 4-1 Link parameters

Connector Description

l Data Warehouse Servicel RDS (MySQL)l RDS (PostgreSQL)l RDS (SQL Server)l DDMl MySQLl PostgreSQLl Microsoft SQL Serverl Oraclel IBM Db2l FusionInsight LibrAl Derecho (GaussDB)

Because the JDBC drivers used to connect to theserelational databases are the same, the parameters tobe configured are also the same and are described inLink to Relational Databases.l When importing data to DWS, specify the Copy

or GDS import mode to improve the importperformance. You can specify the Import Modeparameter when creating a DWS link.

l When importing data to RDS for MySQL, enablethe LOAD DATA function of MySQL toaccelerate data import and improve the importperformance. You can configure Use Local APIto enable the function when you create a MySQLlink.

HUAWEI CLOUD OBS If the data source is OBS on HUAWEI CLOUD, seeLink to OBS.

Alibaba Cloud OSS If the data source is OSS on Alibaba Cloud, see Linkto OSS on Alibaba Cloud.Currently, data can only be exported from OSS toOBS.

Qiniu Cloud Object Storage If the data source is Qiniu Cloud Object Storage(KODO), see Link to Qiniu Cloud Object Storage.Currently, data can only be exported from QiniuCloud Object Storage to OBS.




l MRS HDFSl FusionInsight HDFSl Apache HDFS

If the data source is HDFS of MRS, Apache Hadoop,or FusionInsight HD, see Link to HDFS.NOTE

If Running Mode is set to Standalone, CDM can migratedata between HDFSs of multiple MRS clusters.

l MRS HBasel FusionInsight HBasel Apache HBase

If the data source is HBase of MRS, Apache Hadoop,or FusionInsight HD, see Link to HBase.

MRS Hive If the data source is Hive of MRS, see Link to Hive.

CloudTable Service If the data source is CloudTable, see Link toCloudTable.

l FTPl SFTP

If the data source is an FTP or SFTP server, see Linkto an FTP or SFTP Server.

l HTTPl HTTPS

These connectors are used to read files with anHTTP/HTTPS URL, such as reading public files onthe third-party object storage system and web disks.When creating an HTTP link, you only need toconfigure the link name. The URL is configuredduring job creation.

Network Attached Storage If the data source is a local NAS server, see Link to aNAS Server.CIFS and SMB are supported. CDM can connect todedicated file servers, Windows file sharing servers,Linux Samba servers, and cloud services that provideCIFS/SMB file systems.

l MongoDBl Document Database Service

If the data source is a local MongoDB or DDS, seeLink to MongoDB/DDS.Currently, data can be exported from but cannot beimported to MongoDB or DDS.

l Redisl Distributed Cache Service

If the data source is a local Redis database or DCS,see Link to Redis/DCS.Currently, data can be imported to but cannot beexported from DCS. Data can be imported to andexported from the open source Redis.

Apache Kafka If the data source is the open source Kafka, see Linkto Kafka.Currently, data can only be exported from Kafka toCloud Search Service.

Data Ingestion Service If the data source is DIS, see Link to DIS.Currently, data can only be exported from DIS toCloud Search Service.




l Cloud Search Servicel Elasticsearch

If the data source is Cloud Search Service orElasticsearch, see Link to Elasticsearch.

Data Lake Insight If the data source is DLI, see Link to DLI.Currently, data can be imported to but cannot beexported from DLI.

Step 4 After configuring the parameters of the link, click Test to check whether the link is available.Alternatively, click Save. The system will automatically check whether the link is available.

If the network is poor or the data source is too large, the link test may take 30 to 60 seconds.

----End

4.2 Link Parameter Description

4.2.1 Link to Relational DatabasesBecause the JDBC drivers used by CDM to connect to relational databases are the same, theparameters to be configured are also the same. Currently, CDM supports the followingrelational databases:

l Data Warehouse Servicel RDS (MySQL)l RDS (PostgreSQL)l RDS (SQL Server)l DDMl MySQLl PostgreSQLl Microsoft SQL Serverl Oraclel IBM Db2l FusionInsight LibrAl Derecho (GaussDB)

Compatible Databases and Versions

Table 4-2 lists the relational databases that have been verified to be accessible to CDM.

l The performance of CDM has been optimized to better suit for migration to MySQL andDWS and is higher than that provided by the native JDBC API.

l The following table lists the database types and versions that have been verified to beaccessible (both read and write) to CDM. Other database versions not included in thetable may still be accessible but not tested. If the database version you use isinaccessible, contact the customer service.



Table 4-2 Compatible databases and versions

Database Verified Version

Oracle Oracle 11g 11.2.0.3.0

MySQL MySQL 5.5.43-log

Microsoft SQLServer SQL Server 2012

IBM Db2 Db2 v9.7.0.0

PostgreSQL PostgreSQL 9.1 (x86)

Derecho (GaussDB) GaussDB V100R003C10SPC115

Link ParametersTable 4-3 describes the required parameters of the link to DWS, RDS for MySQL, RDS forPostgreSQL, RDS for SQL Server, DDM, MySQL, PostgreSQL, Microsoft SQL Server,Oracle, IBM Db2, or Derecho (GaussDB).



Name Link name, which can be defined based on thedata source type for easy memorization

mysql_link

Database Server IP address or domain name of the database to beconnectedClick Select next to the text box to obtain thelist of DWS and RDS instances.

192.168.0.1

Port Port number of the database to be connected 3306

Database Name Name of the database to be connected dbname

Username Username of the account for accessing thedatabase. This account must be able to read andwrite data tables and read metadata of thedatabase.

cdm

Password Password of the account -




Import Mode When creating a DWS link, select the dataimport mode.l Copy: Migrate the source data to the DWS

management node and then copy the data toDataNodes. To access DWS through theInternet, select Copy.

l GDS: DataNodes of DWS concurrentlyrequest data from the GDS component ofCDM and then write data to DWS. The GDSmode cannot be used for data export fromDWS.

Theoretically, the GDS mode is more efficientthan the Copy mode. However, when the GDSmode is used, the following configurations arerequired:1. Configure DWS to allow users of the DWS

link to create and delete foreign tables.2. Configure the security group where the

CDM cluster resides to allow the DWSDataNodes to access port 25000 of theinternal IP address of the CDM cluster.

For details, see GDS Import Mode.

GDS

Fetch Size (Optional) This parameter is displayed onlyafter you click Show Advanced Attributes.Number of rows obtained by each request. Setthis parameter based on the data source and thejob's data size. If the value is either too large ortoo small, the job may run for a long time.

1000




Use Local API (Optional) Whether to use the local API of thedatabase for accelerationWhen you create a MySQL link, CDMautomatically enables the local_infile systemvariable of the MySQL database to enable theLOAD DATA function, which accelerates dataimport to the MySQL database.If CDM fails to enable the function, contact thedatabase administrator to enable the local_infilesystem variable. Alternatively, set Use LocalAPI to No to disable API acceleration.If data is imported to RDS for MySQL, theLOAD DATA function is disabled by default. Insuch a case, you need to modify the parametergroup of the MySQL instance and setlocal_infile to ON to enable the LOAD DATAfunction.NOTE

If local_infile on RDS is uneditable, it is the defaultparameter group. You need to create a parametergroup, modify its values, and apply it to the RDS forMySQL instance. For details, see the RelationalDatabase Service User Guide.

Yes

SSL Encryption (Optional) If you set this parameter to Yes,CDM can connect to RDS (on-premisesdatabases excluded) in SSL encryption mode.Security hardening has been performed on RDSfor PostgreSQL. For this reason, when creatinga link to RDS for PostgreSQL, set thisparameter to Yes.

Yes

Link Properties (Optional) Click Add to add the JDBCconnector attributes of multiple specified datasources. For details, see the JDBC connectordocument of the corresponding database.

sslmode=require

Reference Sign (Optional) Delimiter between the names of thereferenced tables or columns. For details, seethe product documentation of the correspondingdatabase.

'

Oracle Version This parameter is displayed only for Oraclelinks. When error message"java.sql.SQLException: Protocol violation" isdisplayed, select another version.

12.1.0.1

Oracle SID Oracle instance ID, which is used todifferentiate databases by instances

dbname



4.2.2 Link to OBSWhen connecting CDM to OBS, configure parameters according to Table 4-4.



Name Link name, which can be definedbased on the data source type foreasy memorization

obs_link

OBS Server IP address or domain name of theOBS server

192.168.0.1

Port Port number of the OBS server,which is 5443 by default

5443

AK AK used to log in to the OBS server HCXUET8G37MWF

SK SK used to log in to the OBS server -

4.2.3 Link to OSS on Alibaba CloudWhen connecting CDM to OSS on Alibaba Cloud, configure parameters according to Table4-5.



Name Link name, which can be defined basedon the data source type for easymemorization

oss_link

OSS Endpoint Endpoint of OSS on Alibaba Cloud oss-cn-hangzhou.aliyuncs.com

AuthenticationMethod

Available identity authentication methods:l Access key: Use the access key to

access OSS.l Temporary access credential: Use

the temporary key and security tokento access OSS.

Access key

AK AK used to log in to the OSS server 0DCPPWWA4VKXCKHIX

SK SK used to log in to the OSS server -




Security Token If you set Authentication Method toTemporary access credential, enter thetemporary token provided by SecurityToken Service (STS).

-

4.2.4 Link to Qiniu Cloud Object StorageWhen connecting CDM to Qiniu Cloud Object Storage (KODO), configure parametersaccording to Table 4-6. Currently, data can only be exported from Qiniu Cloud ObjectStorage to OBS.

Table 4-6 KODO link parameters



kodo_link

Region Region where the data center of KODO islocated

z0

AK AK used to log in to the KODO server 0DCPPWWA4VKXCKHIX

SK SK used to log in to the KODO server -

Use CustomDomain Name toDownload Objects

(Optional) Whether to preferentially usethe custom domain name to downloadobjects from the bucket if the objectstorage bucket has a CDN or othercustom domain names.

Yes

4.2.5 Link to HDFSCurrently, CDM supports the following HDFS data sources:l MRS HDFSl FusionInsight HDFSl Apache HDFS

MRS HDFSWhen connecting CDM to HDFS of MRS, configure parameters according to Table 4-7.



Table 4-7 MRS HDFS link parameters


Name Link name, which can be defined based onthe data source type for easy memorization

mrs_hdfs_link

Manager IP IP address of MRS Manager. Click Selectnext to the Manager IP text box to select acreated MRS cluster. CDM automaticallyfills in the authentication information.

127.0.0.1


Authentication method used for accessingMRSl Simple: Select this if MRS is in non-

security mode.l Kerberos: Select this if MRS is in

security mode.

Simple

Username When Authentication Method is set toKerberos, set the username and passwordfor logging in to MRS Manager.

cdm

Password Password for logging in to MRS Manager -

Running Mode Running mode of the HDFS link. Theoptions are as follows:l Embedded: The link instance runs with

CDM. This mode has better performance.l Standalone: The link instance runs in an

independent process. If CDM needs toconnect to multiple Hadoop data sources(MRS, Hadoop, or CloudTable) with bothKerberos and Simple authenticationmodes, Standalone prevails.

If Standalone is selected, CDM can migratedata between HDFSs of multiple MRSclusters.

Standalone

FusionInsight HDFS

When connecting CDM to HDFS of FusionInsight HD, configure parameters according toTable 4-8.

Table 4-8 FusionInsight HDFS link parameters


Name Link name, which can bedefined based on the data sourcetype for easy memorization

FI_hdfs_link




Manager IP IP address of FusionInsightManager

127.0.0.1

Manager Port Port number of FusionInsightManager

28443

CAS Server Port Port number of the CAS serverused to connect to FusionInsight

20009

Username Username for logging in toFusionInsight Manager

cdm

Password Password for logging in toFusionInsight Manager

-

Authentication Method Authentication method used foraccessing FusionInsight HDl Simple: Select this if

FusionInsight HD is in non-security mode.

l Kerberos: Select this ifFusionInsight HD is insecurity mode.

Kerberos

Running Mode Running mode of the HDFSlink. The options are as follows:l Embedded: The link

instance runs with CDM.This mode has betterperformance.

l Standalone: The linkinstance runs in anindependent process. If CDMneeds to connect to multipleHadoop data sources (MRS,Hadoop, or CloudTable) withboth Kerberos and Simpleauthentication modes,Standalone prevails.

Standalone

Apache HDFSWhen connecting CDM to HDFS of Apache Hadoop, configure parameters according toTable 4-9.



Table 4-9 Apache HDFS link parameters



hadoop_hdfs_link

URI NameNode URI hdfs://nn1.example.com/


Authentication method used foraccessing Hadoopl Simple: Select this if Hadoop is in

non-security mode.l Kerberos: Select this if Hadoop is

in security mode to obtain theprincipal account and the keytabfile from the client forauthentication.

Kerberos

Principal When Authentication Method is setto Kerberos, the principal account isused for authentication. You cancontact the Hadoop administrator toobtain the account.

[email protected]

Keytab File When Authentication Method is setto Kerberos, the keytab file is used forauthentication. You can contact theHadoop administrator to obtain the file.

/opt/user.keytab

Mapping Between IPand Host Name

If the HDFS configuration file uses thehost name, configure the mappingbetween the IP address and host name.Separate the IP addresses and hostnames by spaces and mappings bysemicolons (;) or carriage return andline feeds.

10.1.6.9 hostname0110.2.7.9 hostname02

Running Mode Running mode of the HDFS link. Theoptions are as follows:l Embedded: The link instance runs

with CDM. This mode has betterperformance.

l Standalone: The link instance runsin an independent process. If CDMneeds to connect to multipleHadoop data sources (MRS,Hadoop, or CloudTable) with bothKerberos and Simple authenticationmodes, Standalone prevails.

Standalone



4.2.6 Link to HBaseCurrently, CDM supports the following HBase data sources:l MRS HBasel FusionInsight HBasel Apache HBase

MRS HBase

When connecting CDM to HBase of MRS, configure parameters according to Table 4-10.

Table 4-10 MRS HBase link parameters



mrs_hbase_link

Manager IP IP address of MRS Manager. ClickSelect next to the Manager IP text boxto select a created MRS cluster. CDMautomatically fills in the authenticationinformation.

127.0.0.1


Authentication method used foraccessing MRSl Simple: Select this if MRS is in

non-security mode.l Kerberos: Select this if MRS is in

security mode.

Simple

Username When Authentication Method is setto Kerberos, set the username andpassword for logging in to MRSManager.

admin

Password Password for logging in to MRSManager

-

Running Mode Running mode of the HBase link.l Embedded: The link instance runs



Standalone



FusionInsight HBaseWhen connecting CDM to HBase of FusionInsight HD, configure parameters according toTable 4-11.

Table 4-11 FusionInsight HBase link parameters



FI_hbase_link

Manager IP IP address of FusionInsight Manager 127.0.0.1

Manager Port Port number of FusionInsight Manager 28443

CAS Server Port Port number of the CAS server used toconnect to FusionInsight

20009

Username Username for logging in toFusionInsight Manager

cdm

Password Password for logging in toFusionInsight Manager

-


Authentication method used foraccessing FusionInsight HDl Simple: Select this if FusionInsight

HD is in non-security mode.l Kerberos: Select this if

FusionInsight HD is in securitymode.

Kerberos




Standalone

Apache HBaseWhen connecting CDM to HBase of Apache Hadoop, configure parameters according toTable 4-12.



Table 4-12 Apache HBase link parameters



hadoop_hbase_link

URI NameNode URI hdfs://nn1.example.com/


Authentication method used foraccessing Hadoopl Simple: Select this if Hadoop is in

non-security mode.l Kerberos: Select this if Hadoop is

in security mode to obtain theprincipal account and the keytabfile from the client forauthentication.

Kerberos

Principal When Authentication Method is setto Kerberos, the principal account isused for authentication. You cancontact the Hadoop administrator toobtain the account.

[email protected]

Keytab File When Authentication Method is setto Kerberos, the keytab file is used forauthentication. You can contact theHadoop administrator to obtain the file.

/opt/user.keytab

Mapping Between IPand Host Name

If the configuration file uses the hostname, configure the mapping betweenthe IP address and host name. Separatethe IP addresses and host names byspaces and mappings by semicolons (;)or carriage return and line feeds.

10.3.6.9 hostname0110.4.7.9 hostname02




Standalone



4.2.7 Link to HiveCurrently, CDM supports Hive of MRS. Table 4-13 describes the related parameters.

Table 4-13 Hive link parameters



hivelink


127.0.0.1




security mode.

Simple


cdm


-

4.2.8 Link to CloudTableWhen connecting CDM to CloudTable, configure parameters according to Table 4-14.




cloudtable_link

ZK Link Obtain this parameter valuefrom the cluster managementpage of CloudTable.

cloudtable-cdm-zk1.cloudtable.com:2181,cloudtable-cdm-zk2.cloudtable.com:2181



4.2.9 Link to an FTP or SFTP ServerWhen connecting CDM to an FTP or SFTP server, configure parameters according to Table4-15.




ftp_link

Host Name/IP Address Host name or IP address of the FTPor SFTP server

ftp.apache.org

Port Port number of the FTP or SFTPserver, which is 21 by default

21

Username Username for logging in to the FTPor SFTP server

cdm

Password Password for logging in to the FTPor SFTP server

-

4.2.10 Link to a NAS ServerWhen connecting CDM to a NAS server, configure parameters according to Table 4-16.

CIFS and SMB are supported. CDM can connect to dedicated file servers, Windows filesharing servers, Linux Samba servers, and cloud services that provide CIFS/SMB filesystems.



Name Link name, which can be defined based on the datasource type for easy memorization

nas_link

Protocol NAS file protocol. Currently, only SMB and CIFSare supported.

SMB

Shared Path Shared path of the NAS server \\server\share

Username Username for logging in to the NAS server, whichis in the domain name\username format

CHINA\user

Password Password for logging in to the NAS server -



4.2.11 Link to MongoDB/DDSWhen connecting CDM to an on-premises MongoDB database or DDS, configure parametersaccording to Table 4-17.




mongodb_link

MongoDB ServerList

IP address list of servers. Enter each IPaddress in the IP address or domain nameof the database server:port number formatand separate the entered IP addresses withsemicolons (;).

192.168.0.1:7300;192.168.0.2:7301

Database Name Name of the MongoDB database to beconnected

DB_mongodb

Username Username for logging in to MongoDB cdm

Password Password for logging in to MongoDB -

4.2.12 Link to Redis/DCSWhen connecting CDM to an on-premises Redis database or DCS, configure parametersaccording to Table 4-18.




redis_link

Redis Deployment Method Two deployment methods areavailable:l Single: installation on a

single-node systeml Cluster: installation on a

cluster

Single

Redis Server List IP address list of servers. Entereach IP address in the IPaddress or domain name of thedatabase server:port numberformat and separate the enteredIP addresses with semicolons (;).

192.168.0.1:7300;192.168.0.2:7301




Password Password for logging in to Redis -

Redis Database Index Redis database index, which issimilar to the name of arelational database

0

4.2.13 Link to KafkaWhen connecting CDM to Kafka of local Apache Hadoop, configure parameters according toTable 4-19.




kafka_link

Kafka broker IP address and port number of theKafka broker

192.168.1.1:9092

4.2.14 Link to DISWhen connecting CDM to DIS, configure parameters according to Table 4-20. Currently, datacan only be exported from DIS to Cloud Search Service.



Name Link name, which can be definedbased on the data source type for easymemorization

dis_link

Region DIS partition cn-north-1

Endpoint DIS endpoint to be linked https://dis.cn-north-1.myhuaweicloud.com:20004

AK AK used to log in to the DIS server 0DCPPWWA4VKXCWYWKHIX

SK SK used to log in to the DIS server -

Project ID Project ID of DIS c48475ce8e174a7a9f775706a3d5eb2



4.2.15 Link to ElasticsearchWhen connecting CDM to Cloud Search Service or Elasticsearch, configure parametersaccording to Table 4-21.




css_link

ElasticsearchServer

IP address or domain name of theElasticsearch server

192.168.0.1

Port Port number of the Elasticsearch server 9200

Username (Optional) Username for logging in to thedatabase to be connected

cdm

Password (Optional) Password for logging in to thedatabase to be connected

-

4.2.16 Link to DLIWhen connecting CDM to DLI, configure parameters according to Table 4-22.




dli_link

AK AK required for authentication duringaccess to the DLI database

GRC2WR0IDC6NGROYLWU2

SK SK required for authentication duringaccess to the DLI database

-

Project ID Project ID in the region where DLIresides

a46ed0f02bde42e7afe36777eb9d0f42

4.3 Editing/Deleting a Link

Scenario

CDM allows you to perform the following operations on created links:



l Edit: You can modify link parameters, but cannot re-select connectors. To modify a link,you need to re-enter the password for accessing the data source.

l Test Connectivity: You can directly test the connectivity of a created link.l View Link JSON: Check the link parameter settings in JSON format.l Edit Link JSON: Edit the link parameter settings in JSON format.l Delete: You can delete links that are not used by any jobs in batches.

Prerequisitesl You have obtained the username and password for accessing the desired data source.l Links are not used by any jobs.

Procedure


Step 2 In the left navigation pane, click Cluster Management. Locate the target cluster and chooseJob Management > Link Management.

Step 3 On the Link Management page, locate the links to be deleted.l Edit: Click the link name or click Edit in the Operation column to access the page for

modifying the link. When modifying a link, you need to enter the password for loggingin to the data source again. For details about the parameters, see Link ParameterDescription.

l Test Connectivity: Click Test Connectivity in the Operation column to test theconnectivity of the created link.

l View Link JSON: In the Operation column, choose More > View Link JSON to viewlink parameters in JSON format.

l Edit Link JSON: In the Operation column, choose More > Edit Link JSON to modifylink parameters in JSON format.

l Delete: Select multiple links, and click Delete Link next to Create Link to batch deleteunused links.

----End




5 Job Management

5.1 Creating a Job

5.1.1 Table/File Migration

ScenarioCDM can migrate tables or files between homogeneous and heterogeneous data sources. Fordetails about data sources that support table/file migration, see Data Sources Supported byCDM.

It is applicable to data migration to the cloud, data exchange on the cloud, and data migrationto on-premises service systems.

Prerequisitesl You have created a link by referring to Creating a Link.l The CDM cluster can communicate with the data source.

Procedure


Step 2 In the left navigation pane, click Cluster Management. Locate the target cluster and clickJob Management.

Step 3 Choose Table/File Migration > Create Job. The page for configuring the job is displayed.See Figure 5-1.

Cloud Data MigrationUser Guide 5 Job Management



Figure 5-1 Creating a migration job

Step 4 Configure parameters for the job as follows:l Job Name: Enter a custom job name, which is a string of 1 to 256 characters chosen

from letters, underscores (_), and digits, for example, oracle2obs_t.l Source Link Name: Select the data source from which data is to be exported.l Destination Link Name: Select the data source to which data is to be imported.

If no link is available, click + or go to the Link Management page to create one. For detailsabout how to create a link, see Creating a Link.

Step 5 After selecting the source link, configure the source job parameters. The parameters vary withdata source types. For details, see Table 5-1.

Table 5-1 Source link parameter description

Migration Source Description Parameter Settings

l OBSl Alibaba Cloud

OSSl Qiniu Cloud

Object Storage

Data can be extracted inCSV, JSON, CarbonData, orbinary format. Data extractedin binary format is free fromfile resolution, which ensureshigh performance and ismore applicable to filemigration.Currently, data cannot beimported to Alibaba CloudOSS and Qiniu Cloud ObjectStorage.

For details, see From OBS/OSS.

l MRS HDFSl FusionInsight

HDFSl Apache HDFS

HDFS data can be exportedin CSV, Parquet, or binaryformat and can becompressed in multipleformats.

For details, see From HDFS.




l MRS HBasel FusionInsight

HBasel Apache HBasel CloudTable

Service

Data can be exported fromMRS, FusionInsight HD,open source Apache HadoopHBase, or CloudTable. Youneed to know all columnfamilies and field names ofHBase tables.

For details, see From HBase/CloudTable.

MRS Hive Data can be exported fromHive through the JDBC API.If the data source is Hive,CDM will automaticallypartition data using the Hivedata partitioning file.

For details, see From Hive.

l FTPl SFTPl Network

AttachedStorage

FTP, SFTP, or NAS data canbe exported in CSV, JSON,or binary format.

For details, see From FTP/SFTP/NAS.

l HTTPl HTTPS

These connectors are used toread files with an HTTP/HTTPS URL, such asreading public files on thethird-party object storagesystem and web disks.Currently, data can only beexported from the HTTP/HTTPS URL to HUAWEICLOUD.

For details, see From HTTP/HTTPS.

l Data WarehouseService

l RDS forMySQL

l RDS for SQLServer

l RDS forPostgreSQL

l DDM

Data can be exported fromthe database services ofHUAWEI CLOUD .

When data is exported from thesedata sources, CDM uses the JDBCAPI to extract data. The jobparameters for the migration sourceare the same. For details, see Froma Relational Database.

l FusionInsightLibrA

l Derecho(GaussDB)

Data can be exported fromFusionInsight LibrA andDerecho.




l MySQLl PostgreSQLl Oraclel IBM Db2l Microsoft SQL

Server

The databases that are notprovided by HUAWEICLOUD can be thedatabases created in the localdata center or deployed onECSs, or database serviceson the third-party clouds.

l MongoDBl Document

DatabaseService

Data can be exported fromMongoDB or DDS.

For details, see From MongoDB/DDS.

Redis Data can be exported fromopen source Redis.

For details, see From Redis.

Data IngestionService

Currently, data can only beexported from DIS to CloudSearch Service.

For details, see From DIS.

Apache Kafka Currently, data can only beexported from Kafka toCloud Search Service.

For details, see From ApacheKafka.

l Cloud SearchService

l Elasticsearch

Data can be exported fromCloud Search Service orElasticsearch.

For details, see FromElasticsearch/Cloud SearchService.

Step 6 Configure job parameters for the migration destination based on Table 5-2.


MigrationDestination

Description Parameter Settings

OBS Files (even in a largevolume) can be batchmigrated to OBS in CSV,CarbonData, or binaryformat.

For details, see To OBS.

l MRS HDFSl FusionInsight

HDFSl Apache HDFS

You can select acompression formatwhen importing data toHDFS.

For details, see To HDFS.





l MRS HBasel FusionInsight

HBasel Apache HBasel CloudTable

Service

Data can be imported toHBase. The compressionalgorithm can be setwhen a new HBase tableis created.

For details, see To HBase/CloudTable.

MRS Hive Data can be rapidlyimported to MRS Hive.

For details, see To Hive.

l FTPl SFTPl Network

AttachedStorage

When FTP/SFTP/NASservers function as themigration destination,CDM usually migratescloud data analysisresults back to local filesystems.

For details, see To FTP/SFTP/NAS.


l RDS forMySQL

l RDS for SQLServer

l RDS forPostgreSQL

l DDM

Data can be imported todatabase services ofHUAWEI CLOUD.

For details about how to use the JDBCAPI to import data, see To a RelationalDatabase.l When importing data to DWS,

specify the Copy or GDS importmode to improve the importperformance. You can specify theImport Mode parameter whencreating a DWS link.

l When importing data to RDS forMySQL, enable the LOAD DATAfunction of MySQL to acceleratedata import and improve the importperformance. You can configure UseLocal API to enable the functionwhen you create a MySQL link.

FusionInsight LibrA Data can be imported toFusionInsight LibrA butcannot be imported toDerecho (GaussDB).

MySQL MySQL built in the localdata center, created byusers on Elastic CloudServer (ECS), or on thethird-party cloud

Document DatabaseService

Data can be imported tothe DDS but cannot beimported to the localMongoDB.

For details, see To DDS.

Distributed CacheService

Data can be imported toDCS in the String orHashmap value type.Data cannot be importedto the local Redis.

For details, see To DCS.





l Cloud SearchService

l Elasticsearch

Data can be imported toElasticsearch or CloudSearch Service.

For details, see To Elasticsearch/CloudSearch Service.

Data Lake Insight Data can be imported toDLI.

For details, see To DLI.

Step 7 After the parameters are configured, click Next. The Map Field tab page is displayed. SeeFigure 5-2.

If files are migrated between FTP, SFTP, NAS, HDFS, and OBS and the migration source'sFile Format is set to Binary, files will be directly transferred, free from field mapping.

In other scenarios, CDM automatically maps fields of the source table and the destinationtable. You need to check whether the mapping and time format are correct. For example,check whether the source field type can be converted into the destination field type.


NOTE

l If the field mapping is incorrect, you can drag the fields to adjust the mapping.

l On the Map Field tab page, if CDM fails to obtain all columns by obtaining sample values (forexample, when data is exported from HBase, CloudTable, or MongoDB, there is a high probability

that CDM fails to obtain all columns), you can click and select Add a new field to add newfields to ensure that the data imported to the migration destination is complete.

l If the data is imported to DWS, you need to select the distribution columns in the destination fields.You are advised to select the distribution columns according to the following principles:

1. Use the primary key as the distribution column.

2. If multiple data segments are combined as primary keys, specify all primary keys as thedistribution column.

3. In the scenario where no primary key is available, if no distribution column is selected, DWSuses the first column as the distribution column by default. As a result, data skew risks exist.

Step 8 CDM supports field conversion. You can click and then Create Converter. Figure 5-3shows the Create Converter dialog box.



Figure 5-3 Creating a converter

CDM supports the following converters:

l Anonymization: hides key data in the character string.For example, if you want to convert 12345678910 to 123****8910, configure theparameters as follows:– Set Reserve Start Length to 3.– Set Reserve End Length to 4.– Set Replace Character to *.

l Trim: automatically deletes the spaces before and after the character string.l Reverse string: automatically reverses a character string. For example, reverse ABC

into CBA.l Replace string: replaces the specified character string.l Expression conversion: uses the JSP expression language (EL) to convert the current

field or a row of data. For details, see Field Conversion During Migration.

Step 9 Click Next, set job parameters, and click Show Advanced Attributes to display andconfigure optional parameters. See Figure 5-4.



Figure 5-4 Task parameters

Table 5-3 describes related parameters.





Retry upon Failure You can select Retry 3times or Never.You are advised to configureautomatic retry for only filemigration jobs or databasemigration jobs with Importto Staging Table enabled toavoid data inconsistencycaused by repeated datawrites.

Never

Schedule Execution If you select Yes, you canset the start time, cycle, andvalidity period of a job. Fordetails, see Scheduling JobExecution.

No

Concurrent Extractors Number of extractors to beconcurrently executed.Generally, retain the defaultvalue.

1

Concurrent Loaders Number of Loaders to beconcurrently executedThis parameter is displayedonly when HBase or Hiveserves as the destinationdata source.

3

Write Dirty Data Whether to record dirtydata. By default, thisparameter is set to No.

Yes

Write Dirty Data Link This parameter is displayedonly when Write DirtyData is set to Yes.Only links to OBS andHDFS support dirty datawrites.

obs_link

OBS Bucket This parameter is displayedonly when Write DirtyData Link is a link to OBS.Name of the OBS bucket towhich the dirty data is to bewritten.

dirtydata




Dirty Data Directory This parameter is displayedonly when Write DirtyData is set to Yes.Directory storing dirty dataon HDFS or OBS. Dirtydata will be saved onlywhen this parameter isconfigured.You can go to this directoryto query data that fails to beprocessed or data that isfiltered out during jobexecution, and check whichsource data does not meettransformation or cleaningrules.

/user/dirtydir

Max. Error Records in aSingle Shard

This parameter is displayedonly when Write DirtyData is set to Yes.When the number of errorrecords of a single mapexceeds the upper limit, thejob will automaticallyterminate and the importeddata cannot be rolled back.You are advised to use atemporary table as thedestination table. After thedata is imported, rename thetable or combine it into thefinal data table.

0




Delete Job AfterCompletion

Perform the followingoperations after a job isexecuted:l Do not delete: The job is

not deleted after it isexecuted.

l Delete after success:The job is deleted onlywhen the job issuccessfully executed. Itis applicable to massiveone-time jobs.

l Delete: The job isdeleted regardless ofwhether it is successfullyexecuted or fails to beexecuted.

Do not delete

Step 10 Click Save or Save and Run.

If you click Save, manually start the job by clicking Run on the Job Management page.

----End

5.1.2 Entire DB Migration

ScenarioCDM supports entire database migration between homogeneous and heterogeneous datasources. The migration principles are the same as those in Table/File Migration. Each type ofElasticsearch can be executed concurrently as a subtask.

Entire database migration is applicable to the scenario where an on-premises data center or adatabase created on the HUAWEI CLOUD ECS is synchronized to HUAWEI CLOUDdatabase services or big data services. It is suitable for offline database migration but notonline real-time migration. Figure 5-5 lists the data sources that support entire databasemigration using CDM.




The databases on the migration source can be on-premises data centers, databases built onHUAWEI CLOUD ECSs, or third-party database services.

Prerequisitesl You have created a link by referring to Creating a Link.l The CDM cluster can communicate with the data source.

Procedure



Step 3 Choose Entire DB Migration > Create Job. The page for configuring the job is displayed.See Figure 5-6.




Figure 5-6 Creating an entire database migration job

Step 4 Configure the related parameters of the source database according to Table 5-4.


Source Database Parameter Description ExampleValue

l Oraclel MySQLl PostgreSQLl Microsoft SQL

Server

Schema/Tablespace

Name of the database from whichdata is to be extracted. Click theicon next to the text box to go tothe page for configuring theparameter or directly enter aschema or tablespace.If the desired schema ortablespace is not displayed, checkwhether the login account has thepermission to query metadata.

schema

Elasticsearch Index Index of the data to be extracted,which is similar to the schema orname of a relational database

index

Step 5 Configure the related parameters of the destination cloud service according to Table 5-5.


Cloud Service Parameter Description


Schema/Tablespace

Database name



Cloud Service Parameter Description

l MRS Hivel RDS for

MySQL

Auto TableCreation

The options are as follows:l Non-auto creation: CDM will not

automatically create a table.l Auto creation: If no corresponding table

exists in the destination database, CDMwill automatically create one.

l Deletion before creation: If a table withthe same name exists in the destinationdatabase, CDM will delete the table firstand create another one with the samename.

Clear Data BeforeImport

This parameter is not displayed if Auto TableCreation is set to Deletion before creation.l Yes: CDM will delete data in the tables

who share the same names with the tablesin the source database.

l No: Table data will not be cleared beforedata import. If you set this parameter toNo and tables are not empty, the importeddata will be appended to the existingtables.

Cloud SearchService

Index Index to which data is to be written, which issimilar to the schema or name of a relationaldatabase


Whether to clear data of the target type beforedata is written

HUAWEI CLOUDOBS

- For details about the destination jobparameters required for entire databasemigration to OBS, see To OBS.

Step 6 If the database to be migrated is a relational database, click Next to specify whether tomigrate some or all tables after configuring job parameters.

After selecting desired tables, click or to move them to the right pane.

Step 7 Click Save or Save and Run.

When the job starts running, a sub-job will be generated for each table. You can click the jobname to view the sub-job list.

----End

5.2 Source Job Parameters



5.2.1 From OBS/OSSWhen the source link of a job is the Link to OBS or Link to OSS on Alibaba Cloud, that is,when data is exported from HUAWEI CLOUD OBS or Alibaba Cloud OSS, configure thesource job parameters based on Table 5-6.

Advanced attributes are optional and not displayed by default. You can click Show AdvancedAttributes to display them.


Category Parameter Description ExampleValue

Basicparameters

Bucket Name Name of the bucket from which datais to be migrated

BUCKET_2

Source Directory/File

Path of the directory or file fromwhich data is to be extracted. The filepath can contain a maximum of 50files, which are separated by verticalbars (|). For details, see Migration ofa List of Files.This parameter can be configured as amacro variable of date and time and apath name can contain multiplemacro variables. When the macrovariable of date and time works witha scheduled job, the incremental datacan be synchronized periodically. Fordetails, see IncrementalSynchronization Using the MacroVariables of Date and Time.

FROM/example.csv




File Format Format in which CDM parses data.The options are as follows:l CSV: Source files will be

migrated to tables after beingparsed in CSV format.

l Binary: Files (even not in binaryformat) will be directly transferredwithout resolution. You areadvised to select Binary whenmigrating files to files because themigration efficiency is higher.

l JSON: Source files will bemigrated to tables after beingparsed in JSON format.

l CarbonData: Source files will bemigrated to tables after beingparsed in CarbonData format. Thisparameter is displayed only whenthe migration source is OBS.

CSV

JSON Type This parameter is displayed onlywhen File Format is set to JSON.Type of a JSON object stored in aJSON file. The options are JSONobject and JSON array.

JSON object

JSON ReferenceNode

This parameter is displayed onlywhen File Format is set to JSON.CDM parses the data under the JSONnode. If the node's corresponding datais an array of JSON, the system willextract data from the array in thesame pattern. Use periods (.) toseparate multi-layer nested JSONnodes.

data.list

Advancedattributes

Line Separator Lind feed characters in a file.Supported values include \n, \r, and\r\n. This parameter is displayed onlywhen File Format is set to CSV.

\n

Field Delimiter Character used to separate fields inthe file. To set the Tab key as thedelimiter, set this parameter to \t.This parameter is displayed onlywhen File Format is set to CSV.

,




Use QuoteCharacter

If you set this parameter to Yes, thefield delimiters in the encirclingsymbol are regarded as a part of thestring value. Currently, the defaultencircling symbol of CDM is ".

No

Use RE toSeparate Fields

Whether to use regular expressions toseparate fields. If you set thisparameter to Yes, Field Delimiterbecomes invalid. This parameter isdisplayed only when File Format isset to CSV.

Yes

RegularExpression

Regular expression used to separatefields. For details about regularexpressions, see Using RegularExpressions to Separate Semi-structured Text.

^(\d.*\d)(\w*) \[(.*)\]([\w\.]*)(\w.*).*

Use First Row asHeader

This parameter is displayed onlywhen File Format is set to CSV. Ifyou set this parameter to Yes, CDMwill use the first row as the headerwhen extracting data.

Yes

Encoding Type Encoding type, for example, UTF-8or GBK. You can set the encodingtype for text files only. Thisparameter is invalid if File Format isset to Binary.

GBK

CompressionFormat

This parameter is displayed onlywhen File Format is set to CSV orJSON. The options are as follows:l None: indicates that files in all

formats can be transferred.l gzip: indicates that only files in

gzip format can be transferred.l Zip: indicates that only files in

Zip format can be transferred.

None

Source FileProcessingMethod

Operation on source files after the jobsucceeds.l Rename: After the job succeeds,

rename the source files by addingusernames and timestamps assuffixes to file names.

l Delete: After the job succeeds,delete the source files.

Rename




Start Job byMarker File

Whether to start a job by a markerfile. A job is started only when amarker file for starting the job existsin the source path. Otherwise, the jobwill be suspended for a period of timespecified by Suspension Period.

No

Marker File Name of the marker file for starting ajob. If you specify a marker file, themigration job is executed only whenthe marker file exists in the sourcepath. The marker file will not bemigrated.

ok.txt

Suspension Period Period of waiting for a marker file. Ifyou set Start Job by Marker File toYes but no marker file exists in thesource path, the job fails uponsuspension timeout.If you set this parameter to 0 and nomarker file exists in the source path,the job will fail immediately.Unit: second

10

Filter Type Whether to filter the files by wildcardor regular expression. If you chooseto filter files by regular expression,the Java regular expressions are used.For details, see File/Path Filter.l Wildcard: indicates that wildcard

characters are used.l Regex: indicates that Java regular

expressions are used.

Wildcard

Path Filter Filter directories under the input path.Only directories meeting the filterconditions can be migrated. Multiplepaths can be configured. Use commas(,) to separate multiple paths.

*input

File Filter Filter files under the input path. Onlyfiles meeting the filtering rules can bemigrated. Multiple files can beconfigured. Use commas (,) toseparate multiple files.

*.csv,*.txt



NOTE

1. CDM supports incremental file migration (by skipping repeated files), but does not supportresumable transfer.

For example, if three files are to be migrated and the second file fails to be migrated due to thenetwork fault. When the migration task is started again, the first file is skipped. The second file,however, cannot be migrated from the point where the fault occurs, but can only be migrated again.

2. During file migration, a single task supports a maximum of 100,000 files. If there are too many filesin the directory to be migrated, you are advised to split the files into different directories and createmultiple tasks.

5.2.2 From HDFSWhen the source link of a job is the Link to HDFS, that is, when data is exported from MRSHDFS, FusionInsight HDFS, or Apache HDFS, configure the source job parameters based onTable 5-7.



Basicparameters


Path of the directory or file fromwhich data is to be extractedThis parameter can be configuredas a macro variable of date and timeand a path name can containmultiple macro variables. When themacro variable of date and timeworks with a scheduled job, theincremental data can besynchronized periodically. Fordetails, see IncrementalSynchronization Using the MacroVariables of Date and Time.

/user/cdm/

File Format File format used when transferringdata. The options are as follows:l CSV: Source files will be


l Binary: Files (even not inbinary format) will be directlytransferred without resolution.You are advised to select Binarywhen migrating files to filesbecause the migration efficiencyis higher.

l Parquet: Source files will bemigrated to tables after beingparsed in Parquet format.

CSV




Advancedattributes

Line Separator Lind feed characters in a file.Supported values include \n, \r, and\r\n. This parameter is displayedonly when File Format is set toCSV.

\n


,



No

File Split Method Whether to split files by file or sizeIf HDFS files are split, each shardis regarded as a file.l File: Separate files by file

quantity. If there are 10 files andConcurrent Extractors is set to5, each shard consists of twofiles.

l Size: Separate files by file size.Files will not be split forbalance. Suppose there are 10files, among which nine are 10MB and one is 200 MB in size.If Concurrent Extractors is setto 2, two shards will be created,one for processing the nine 10MB files, the other one forprocessing the 200 MB file.

File

Source FileProcessing Method

Operation on source files after thejob succeeds.l Rename: After the job

succeeds, rename the sourcefiles by adding usernames andtimestamps as suffixes to filenames.


Rename




Startup Job MarkerFile

Whether to start a job by a markerfile. A job is started only when amarker file for starting the jobexists in the source path. Otherwise,the job will be suspended for aperiod of time specified bySuspension Period.

ok.txt

Filter Type Whether to filter the files bywildcard or regular expression. Ifyou choose to filter files by regularexpression, the Java regularexpressions are used. For details,see File/Path Filter.l Wildcard: indicates that

wildcard characters are used.l Regex: indicates that Java

regular expressions are used.

Wildcard

Path Filter Filter directories under the inputpath. Only directories meeting thefilter conditions can be migrated.Multiple paths can be configured.Use commas (,) to separate multiplepaths.

*input

File Filter Filter files under the input path.Only files meeting the filteringrules can be migrated. Multiple filescan be configured. Use commas (,)to separate multiple files.

*.csv

NOTE

HDFS supports the UTF-8 encoding only. Retain the default value UTF-8.

5.2.3 From HBase/CloudTableWhen the source link of a job is the Link to HBase or Link to CloudTable, that is, whendata is exported from MRS HBase, FusionInsight HBase, or Apache HBase, configure thesource job parameters based on Table 5-8.



NOTE

1. When you migrate data from CloudTable or HBase, CDM reads the first row of the table as anexample of the field list. If the first row of data does not contain all fields of the table, you need tomanually add fields.

2. Because HBase is schema-less, CDM cannot obtain the data types. If the data is stored in binaryformat, CDM cannot parse the data.

3. When data is exported from HBase or CloudTable, because HBase and CloudTable are schema-lessstorage systems, CDM requires that the source numeric fields be stored in character strings ratherthan in binary format. For example, the value 100 needs to be stored as 100 rather than 01100100.



Table Name Name of the HBase table from which data is to beexportedThis parameter can be configured as a macro variableof date and time and a path name can contain multiplemacro variables. When the macro variable of date andtime works with a scheduled job, the incremental datacan be synchronized periodically. For details, seeIncremental Synchronization Using the MacroVariables of Date and Time.

TBL_2

ColumnFamilies

(Optional) Column families to which the exported databelongs

CF1&CF2

SplitRowkey

(Optional) Whether to split a rowkey. The defaultvalue is No.

Yes

RowkeyDelimiter

(Optional) Delimiter used to split a rowkey. If thisparameter is left empty, the rowkey will not be split.

|

Start Time (Optional) Start time (including the value) forextracting data. The format is yyyy-MM-dd HH:mm:ss.Only the data generated at the specified time and lateris extracted.This parameter can be set to a macro variable of dateand time. When the macro variable of date and timeworks with a scheduled job, the incremental data canbe synchronized periodically. For details, seeIncremental Synchronization Using the MacroVariables of Date and Time.

2017-12-3120:00:00

End Time (Optional) End time (excluding the value) forextracting data. The format is yyyy-MM-dd HH:mm:ss.Only the data generated before the time point isextracted.This parameter can be set to a macro variable of dateand time. For details, see IncrementalSynchronization Using the Macro Variables of Dateand Time.

2018-01-0120:00:00



5.2.4 From HiveIf the source link of a job is the Link to Hive, configure the source job parameters based onTable 5-9.



Database Name Database name. Click the icon next to thetext box. The dialog box for selecting thedatabase is displayed.

default

Table Name Hive table name. Click the icon next to thetext box. The dialog box for selecting thetable is displayed.This parameter can be configured as a macrovariable of date and time and a path namecan contain multiple macro variables. Whenthe macro variable of date and time workswith a scheduled job, the incremental datacan be synchronized periodically. For details,see Incremental Synchronization Using theMacro Variables of Date and Time.

TBL_EXAMPLE

NOTE

If the data source is Hive, CDM will automatically partition data using the Hive data partitioning file.

5.2.5 From FTP/SFTP/NASIf the source link of a job is the Link to an FTP or SFTP Server or Link to a NAS Server,configure the source job parameters based on Table 5-10.






Basicparameters


Path of the directory or file fromwhich data is to be extracted. Thefile path can contain a maximum of50 files, which are separated byvertical bars (|). For details, seeMigration of a List of Files.This parameter can be configuredas a macro variable of date and timeand a path name can containmultiple macro variables. When themacro variable of date and timeworks with a scheduled job, theincremental data can besynchronized periodically. Fordetails, see IncrementalSynchronization Using the MacroVariables of Date and Time.

/ftp/a.csv|/ftp/b.txt

File Format Format in which CDM parses data.The options are as follows:l CSV: Source files will be


l Binary: Files (even not inbinary format) will be directlytransferred without resolution.You are advised to select Binarywhen migrating files to filesbecause the migration efficiencyis higher.

l JSON: Source files will bemigrated to tables after beingparsed in JSON format.

CSV

JSON Type This parameter is displayed onlywhen File Format is set to JSON.Type of a JSON object stored in aJSON file. The options are JSONobject and JSON array.

JSON object




JSON ReferenceNode

This parameter is displayed onlywhen File Format is set to JSON.CDM parses the data under theJSON node. If the node'scorresponding data is an array ofJSON, the system will extract datafrom the array in the same pattern.Use periods (.) to separate multi-layer nested JSON nodes.

data.list

Advancedattributes

Line Separator Lind feed characters in a file.Supported values include \n, \r, and\r\n. This parameter is displayedonly when File Format is set toCSV.

\n


,

Use QuoteCharacter

If you set this parameter to Yes, thefield delimiters in the encirclingsymbol are regarded as a part of thestring value. Currently, the defaultencircling symbol of CDM is ".

No

Use RE to SeparateFields

Whether to use regular expressionsto separate fields. If you set thisparameter to Yes, Field Delimiterbecomes invalid. This parameter isdisplayed only when File Formatis set to CSV.

Yes

Regular Expression Regular expression used to separatefields. For details about regularexpressions, see Using RegularExpressions to Separate Semi-structured Text.

^(\d.*\d)(\w*) \[(.*)\]([\w\.]*)(\w.*).*



Yes

Encoding Type Encoding type, for example, UTF-8or GBK. You can set the encodingtype for text files only. Thisparameter is invalid if File Formatis set to Binary.

UTF-8




CompressionFormat

This parameter is displayed onlywhen File Format is set to CSV orJSON. The options are as follows:l None: indicates that files in all

formats can be transferred.l gzip: indicates that only files in

gzip format can be transferred.l Zip: indicates that only files in

Zip format can be transferred.

None

Source FileProcessing Method

Operation on source files after thejob succeeds.l Rename: After the job

succeeds, rename the sourcefiles by adding usernames andtimestamps as suffixes to filenames.


Rename

Start Job by MarkerFile

Whether to start a job by a markerfile. A job is started only when amarker file for starting the jobexists in the source path. Otherwise,the job will be suspended for aperiod of time specified bySuspension Period.

Yes

Marker File Name of the marker file for startinga job. If you specify a marker file,the migration job is executed onlywhen the marker file exists in thesource path. The marker file willnot be migrated.

ok.txt

Suspension Period Period of waiting for a marker file.If you set Start Job by MarkerFile to Yes but no marker file existsin the source path, the job failsupon suspension timeout.If you set this parameter to 0 and nomarker file exists in the sourcepath, the job will fail immediately.Unit: second

10




Filter Type Whether to filter the files bywildcard or regular expression. Ifyou choose to filter files by regularexpression, the Java regularexpressions are used. For details,see File/Path Filter.l Wildcard: indicates that

wildcard characters are used.l Regex: indicates that Java

regular expressions are used.

Wildcard

Path Filter Filter directories under the inputpath. Only directories meeting thefilter conditions can be migrated.Multiple paths can be configured.Use commas (,) to separate multiplepaths.

*input,*out

File Filter Filter files under the input path.Only files meeting the filteringrules can be migrated. Multiple filescan be configured. Use commas (,)to separate multiple files.

*.csv

5.2.6 From HTTP/HTTPSWhen the source link of a job is the HTTP link, configure the source job parameters based onTable 5-11. Currently, data can only be exported from the HTTP/HTTPS URL to HUAWEICLOUD.



File URL Use the GET method to obtain data from theHTTP/HTTPS URL.These connectors are used to read files with anHTTP/HTTPS URL, such as reading public fileson the third-party object storage system and webdisks.

https://bucket.obs.myhwclouds.com/object-key

File Format Currently, CDM supports Binary only, whichindicates that files (even not in binary format)will be directly transferred without resolution.

Binary



5.2.7 From a Relational DatabaseWhen the source link of a job is the Link to Relational Databases, that is, when data isexported from the following databases, configure the source job parameters based on Table5-12.

l Data Warehouse Servicel RDS for MySQLl RDS for SQL Serverl RDS for PostgreSQLl DDMl FusionInsight LibrAl Derecho (GaussDB)l MySQLl PostgreSQLl Oraclel IBM Db2l Microsoft SQL Server



Basicparameters

Schema/Tablespace Name of the database from whichdata is to be extracted. Click theicon next to the text box to go to thepage for configuring the parameteror directly enter a schema ortablespace.If the desired schema or tablespaceis not displayed, check whether thelogin account has the permission toquery metadata.

SCHEMA_EXAMPLE




Table Name Table from which data is to beextracted. Click the icon next to thetext box to go to the page forselecting the table or directly entera table name.If the desired table is not displayed,check whether the table exists orwhether the login account has thepermission to query metadata.This parameter can be configuredas a macro variable of date and timeand a path name can containmultiple macro variables. When themacro variable of date and timeworks with a scheduled job, theincremental data can besynchronized periodically. Fordetails, see IncrementalSynchronization Using the MacroVariables of Date and Time.

all_type

Advancedattributes

Partition Column Field used to partition data duringdata extraction. CDM splits jobsinto multiple tasks based on thisfield and executes the tasksconcurrently. Fields with datadistributed evenly are used, such asthe sequential number field.Click the icon next to the text boxto go to the page for selectingcolumns or directly enter a partitioncolumn name.

id

Where Clause SQL statement used to specify thedata extraction range. If thisparameter is not set, the entire tablewill be extracted.This parameter can be configuredas a macro variable of date and timeto extract data generated at aspecific date. For details, seeIncremental SynchronizationUsing the Macro Variables ofDate and Time.

time between'${timestamp(-1,DAY)} and${timestamp()}'




Regain Symbol After Regain Symbol is set to aspecified field, CDM queries thetable imported to the destinationdatabase every time a scheduledtask is started. If the table does notcontain the specified field, CDMperforms full migration. If the tablecontains the specified field and thefield has a value, CDM performsincremental migration to migrateonly the data whose value is greaterthan the value of this field.For details about how to use thisparameter, see IncrementalMigration Using the RegainSymbol.

date

MigrateIncremental Data

Whether to migrate incrementaldata in MySQL Binlog mode.Currently, this mode can be used inthe table/file migration from theMySQL database to the DWSdatabase only.After this function is enabled, datain the source table and destinationtable can be synchronized in realtime. One MySQL link supportsonly one incremental migration job,and one source table supports onlyone incremental migration job.

No



NOTE

l If the migration source is Oracle, CDM will automatically partition data using the ROWID.

l In the migration from MySQL to DWS, the constraints on the incremental data migration function inMySQL Binlog mode are as follows:1. A single cluster supports only one incremental migration job in MySQL Binlog mode in the

current version.2. In the current version, you are not allowed to delete or update 10,000 data records at a time.3. Entire database migration is not supported.4. DDL Data Definition Language (DDL) operations are not supported.5. Event migration is not supported.6. If you set Migrate Incremental Data to Yes, binlog_format in the source MySQL database

must be set to ROW.7. If you set Migrate Incremental Data to Yes and binlog file ID disorder occurs on the source

MySQL instance due to cross-machine migration or rebuilding during incremental datamigration, incremental data may be lost.

8. If a primary key exists in the destination table and incremental data is generated during therestart of the CDM cluster or full migration, duplicate data may exist in the primary key. As aresult, the migration fails.

9. If the destination DWS database is restarted, the migration will fail. In this case, restart the CDMcluster and the migration job.

The recommended MySQL configuration is as follows:# Enable the bin-log function.log-bin=mysql-bin# ROW modebinlog-format=ROW# gtid mode. The recommended version is 5.6.10 or later.gtid-mode=ONenforce_gtid_consistency = ON

5.2.8 From MongoDB/DDSWhen you migrate data from MongoDB to a relational database, CDM reads the first row ofthe collection as an example of the field list. If the first row of data does not contain all fieldsof the collection, you need to manually add fields.

When the source link of a job is the Link to MongoDB/DDS, that is, when data is exportedfrom an on-premises MongoDB or DDS, configure the source job parameters based on Table5-13.



Database Name Name of the database from which data is to bemigrated

mongodb

Collection Name Collection name, similar to the table name of arelational database. Click the icon next to thetext box to go to the page for selecting thecollection or directly enter a collection name.If the desired table is not displayed, checkwhether the table exists or whether the loginaccount has the permission to query metadata.

COLLECTION_NAME



5.2.9 From RedisBecause DCS restricts the commands for obtaining keys, it cannot serve as the migrationsource but can be the migration destination. The Redis service of the third-party cloud cannotserve as the migration source. However, the Redis set up in the on-premises data center or onthe ECS can be the migration source and destination.

When data is exported from the on-premises Redis, configure the source job parameters basedon Table 5-14.


Parameter Description ExampleValue

Redis KeyPrefix

Key prefix, which is similar to the table name of arelational database

TABLENAME

Value StorageType

The options are as follows:l String: without column name, such as

value1,value2l Hash: with column name, such as

column1=value1,column2=value2

String

Key Delimiter Character used to separate table names and columnnames of a relational database

_

Value Delimiter Character used to separate columns when the storagetype is string

;

5.2.10 From DISThe data in the message body is a record in CSV format that supports multiple delimiters.Messages cannot be parsed in binary or other formats.

If the source link of a job is the Link to DIS, configure the source job parameters based onTable 5-15. Currently, data can only be exported from DIS to Cloud Search Service.



DIS Stream DIS stream name dis




Offset Initial offset when data is pulled from DISl Latest: Maximum offset, indicating that the

latest data will be extracted.l From last stop: Data read will start from

which the last read ends.l Earliest: Minimum offset, indicating that the

earliest data will be extracted.

Latest

Permanent Running Whether the job is permanently running. If a job isset to run for a long time, the job will fail if theDIS system is interrupted.

Yes

DIS Partition ID ID of the DIS partition. You can enter multiplepartition IDs, which are separated by commas (,).

0,1,2

Field Delimiter The default value is space. To set the Tab key asthe delimiter, set this parameter to \t.

,

Max. Poll Records (Optional) Maximum number of records per poll 100

5.2.11 From Apache KafkaIf the source link of a job is a link to Kafka, configure parameters according to Table 5-16.



Topics One or more topics can be entered. test1,test2

Offset Initial offset parameterl Latest: Maximum offset, indicating that the

latest data will be extracted.l Earliest: Minimum offset, indicating that the

earliest data will be extracted.

Latest

Permanent Running Whether the job is permanently running Yes

Group ID Group ID -

Field Delimiter The default value is space. To set the Tab key asthe delimiter, set this parameter to \t.

,

Max. Poll Records (Optional) Maximum number of records per poll 100

Max. Poll Interval (Optional) Maximum interval between polls(seconds)

100



5.2.12 From Elasticsearch/Cloud Search ServiceIf the source link of a job is the Link to Elasticsearch, configure the source job parametersbased on Table 5-17.



Index Elasticsearch index, which is similar to thename of a relational database. The indexname can contain only lowercase letters.

index

Type Elasticsearch type, which is similar to thetable name of a relational database. Thetype name can contain only lowercaseletters.

type

Split Nested Field (Optional) Whether to split the JSONcontent of the nested fields. For example, a:{ b:{ c:1, d:{ e:2, f:3 } } } can be split intoa.b.c, a.b.d.e, and a.b.d.f.

No

Filter Conditions (Optional) Whether to use the query stringto filter the source data. CDM migratesonly the data that meets the filterconditions.

last_name:Smith

5.3 Destination Job ParametersThis section describes how to configure destination job parameters when creating a table/filemigration job.

5.3.1 To OBSWhen the destination link of a job is the Link to OBS, that is, when data is imported to OBS,configure the destination job parameters based on Table 5-18.




Basicparameters

Bucket Name Name of the OBS bucket to whichdata is to be written

BUCKET_2




Write Directory OBS directory to which data is to bewritten. Do not add / in front of thedirectory name.This parameter can be configured asa macro variable of date and time anda path name can contain multiplemacro variables. When the macrovariable of date and time works witha scheduled job, the incremental datacan be synchronized periodically. Fordetails, see IncrementalSynchronization Using the MacroVariables of Date and Time.

DIRECTORY/

File Format Format in which data is written. Theoptions are as follows:l CSV: Data is written in CSV

format, which is applicable tomigrating data tables to files.

l Binary: Files will be directlytransferred without resolution.CDM writes the files withoutchanging the original file format,which is applicable to themigration of files to files.

l CarbonData: Data is written inCarbonData format, which isapplicable to migrating data tablesto files.

If data is migrated between file-related data sources, such as FTP,SFTP, NAS, HDFS, and OBS, thevalue of File Format must the sameas the source file format.

CSV

Duplicate FileProcessingMethod

Files with the same name and size areidentified as duplicate files. If thereare duplicate files during datawriting, the following methods areavailable:l Replacel Skipl Stop jobFor details, see Duplicate FileProcessing Method.

Skip




Advancedattributes

KMSEncryption

Whether to encrypt the uploaded databy using Key Management Service(KMS). If KMS encryption isenabled, MD5 verification cannot beperformed for data. For details, seeData Encryption During theMigration to OBS.

Yes

Key ID Key used for encryption duringupload. You need to create a key inKMS in advance.

53440ccb-3e73-4700-98b5-71ff5476e621

Copy Content-Type

This parameter is displayed onlywhen File Format is Binary andboth the migration source anddestination are object storage.If you set this parameter to Yes, theContent-Type attribute of the sourcefile is copied during object filemigration. This function is mainlyused for static website migration.The Content-Type attribute cannot bewritten to Archive buckets.Therefore, if you set this parameter toYes, the migration destination mustbe a non-Archive bucket.

No

Line Separator Lind feed character in the file. Bydefault, the system automaticallyidentifies \n, \r, and \r\n. Thisparameter is invalid when FileFormat is set to Binary.

\n

Field Delimiter Field delimiter in the file. Thisparameter is invalid when FileFormat is set to Binary.

,

File Size This parameter is displayed onlywhen the migration source is adatabase. Files are partitioned asmultiple files by size so that they canbe exported in proper size. The unitis MB.

1024




Validate MD5Value

The MD5 value can be verified onlywhen files are transferred in Binaryformat. KMS encryption cannot beused when the MD5 value needs tobe verified.Calculate the MD5 value of thesource files and verify it with theMD5 value returned by OBS. If anMD5 file exists on the migrationsource, the system directly reads theMD5 file from the migration sourceand verify it with the MD5 valuereturned by OBS. For details, seeMD5 Verification for Files inMigration.

Yes

Record MD5VerificationResult

Whether to record the MD5verification result when ValidateMD5 Value is set to Yes

Yes

Record MD5Link

OBS link to which the MD5verification result is to be written

obslink

Record MD5Bucket

OBS bucket to which the MD5verification result is to be written

cdm05

Record MD5Directory

Directory to which the MD5verification result is to be written

/md5/

Encoding Type Encoding type, for example, UTF_8or GBK. This parameter is invalidwhen File Format is set to Binary.

GBK

Use QuoteCharacter

This parameter is displayed onlywhen File Format is CSV. It is usedwhen database tables are migrated tofile systems.If you set this parameter to Yes and afield in the source data table containsa field delimiter or line separator,CDM uses double quotation marks(") as the quote character to quote thefield content as a whole to prevent afield delimiter from dividing a fieldinto two fields, or a line separatorfrom dividing a field into differentlines. For example, if the hello,worldfield in the database is quoted, it willbe exported to the CSV file as awhole.

No




Job SuccessMarker File

Whether to generate a marker filewith a custom name in thedestination directory after a job isexecuted successfully. If you do notspecify a file name, this function isdisabled by default.

finish.txt

5.3.2 To HDFSWhen the destination link of a job is the Link to HDFS, that is, when data is imported to thefollowing data sources, configure the destination job parameters based on Table 5-19.

l MRS HDFSl FusionInsight HDFSl Apache HDFS



Write Directory HDFS directory to which data is to be written.This parameter can be configured as a macrovariable of date and time and a path name cancontain multiple macro variables. When themacro variable of date and time works with ascheduled job, the incremental data can besynchronized periodically. For details, seeIncremental Synchronization Using theMacro Variables of Date and Time.

/user/output

File Format Format in which data is written. The options areas follows:l CSV: Data is written in CSV format, which

is applicable to migrating data tables to files.l Binary: Files will be directly transferred

without resolution. CDM writes the fileswithout changing the original file format,which is applicable to the migration of filesto files.

If data is migrated between file-related datasources, such as FTP, SFTP, NAS, HDFS, andOBS, the value of File Format must the same asthe source file format.

CSV




Duplicate FileProcessing Method

Files with the same name and size are identifiedas duplicate files. If there are duplicate filesduring data writing, the following methods areavailable:l Replacel Skipl Stop job

Stop job

Compression Format File compression format after data writing. Thefollowing compression formats are supported:l None: Do not compress the files.l DEFLATE: Compress the files in DEFLATE

format.l gzip: Compress the files in gzip format.l bzip2: Compress the files in bzip2 format.l LZ4: Compress the files in LZ4 format.l Snappy: Compress the files in Snappy

format.

Snappy

Line Separator Lind feed character in the file. By default, thesystem automatically identifies \n, \r, and \r\n.This parameter is invalid when File Format isset to Binary.

\n

Field Delimiter Field delimiter in the file. This parameter isinvalid when File Format is set to Binary.

,

NOTE

HDFS supports the UTF-8 encoding only. Retain the default value UTF-8.

5.3.3 To HBase/CloudTableWhen the destination link of a job is the Link to HBase or Link to CloudTable, that is, whendata is imported to the following data sources, configure the destination job parameters basedon Table 5-20.

l MRS HBasel FusionInsight HBasel Apache HBasel CloudTable Service





Table Name Name of the HBase table to which data is to bewritten. If you want to create an HBase table,you can copy the field names from the migrationsource. Click the icon next to the text box. Thedialog box for selecting the table is displayed.This parameter can be configured as a macrovariable of date and time and a path name cancontain multiple macro variables. When themacro variable of date and time works with ascheduled job, the incremental data can besynchronized periodically. For details, seeIncremental Synchronization Using theMacro Variables of Date and Time.

TBL_2


Operation on the tables with duplicate namesbefore data import. The options are as follows:l Yes: CDM will delete data in the tables who

share the same names with the tables in thesource database.

l No: Data is appended to the existing tables.

Yes

Rowkey Delimiter (Optional) Used to combine multiple rows into arowkey. Spaces are used by default.

,

Rowkey DataRedundancy

(Optional) Whether to write the rowkey data intoHBase columns. The default value is No.

No

Compression Format (Optional) Compression format used in creatinga new HBase table. The default value is None.l None: Do not compress the files.l Snappy: Compress the files in Snappy

format.l gzip: Compress the files in gzip format.

None

Write WAL Whether to enable Write Ahead Log (WAL) ofHBase. The options are as follows:l Yes: If the HBase server breaks down after

the function is enabled, you can replay theoperations that have not been performed inWAL.

l No: If you set this parameter to No, the writeperformance is improved. However, if theHBase server breaks down, data may be lost.

No




Match Data Type l Yes: Data of the Short, Int, Long, Float,Double, and Decimal columns in the sourcedatabase is converted into Byte[] arrays(binary) and written into HBase. Other typesof data are written as character strings. Ifseveral types of data mentioned above arecombined as rowkeys, they will be written ascharacter strings.This function saves storage space. In specificscenarios, the rowkey distribution is evener.

l No: All types of data in the source databaseare written into HBase as character strings.

No

5.3.4 To HiveWhen the destination link of a job is the Link to Hive, configure the destination jobparameters based on Table 5-21.



Database Name Database name. Click the icon next to thetext box. The dialog box for selecting thedatabase is displayed.

default

Auto Table Creation This parameter is displayed only whenboth the migration source and destinationare relational databases. The options areas follows:l Non-auto creation: CDM will not

automatically create a table.l Auto creation: If the destination

database does not contain the tablespecified by Table Name, CDM willautomatically create the table. If thetable specified by Table Name alreadyexists, no table is created and data iswritten to the existing table.

l Deletion before creation: CDMdeletes the table specified by TableName, and then creates the tableagain.

Non-auto creation




Table Name Destination table name.Click the icon next to the text box. Thedialog box for selecting the table isdisplayed.This parameter can be configured as amacro variable of date and time and apath name can contain multiple macrovariables. When the macro variable ofdate and time works with a scheduled job,the incremental data can be synchronizedperiodically. For details, see IncrementalSynchronization Using the MacroVariables of Date and Time.

TBL_EXAMPLE


This parameter is not displayed if AutoTable Creation is set to Deletion beforecreation.l Yes: CDM will delete data in the tables

who share the same names with thetables in the source database.

l No: Data is appended to the existingtables.

Yes



NOTE

1. When Hive serves as the migration destination, the storage format selected during table creation willbe automatically used, such as ORC and Parquet.

2. When Hive serves as the migration destination, if the storage format is TEXTFILE, delimiters mustbe explicitly specified in the statement for creating Hive tables. The following gives an example.CREATE TABLE csv_tbl(smallint_value smallint,tinyint_value tinyint,int_value int,bigint_value bigint,float_value float,double_value double,decimal_value decimal(9, 7),timestmamp_value timestamp,date_value date,varchar_value varchar(100),string_value string,char_value char(20),boolean_value boolean,binary_value binary,varchar_null varchar(100),string_null string,char_null char(20),int_null int)ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'WITH SERDEPROPERTIES ("separatorChar" = "\t","quoteChar" = "'","escapeChar" = "\\")STORED AS TEXTFILE;

5.3.5 To FTP/SFTP/NASIf the destination link of a job is Link to an FTP or SFTP Server or Link to a NAS Server,configure the destination job parameters based on Table 5-22.




Basicparameters

Write Directory Directory to which data is to bewritten.This parameter can be configuredas a macro variable of date and timeand a path name can containmultiple macro variables. When themacro variable of date and timeworks with a scheduled job, theincremental data can besynchronized periodically. Fordetails, see IncrementalSynchronization Using the MacroVariables of Date and Time.

/opt/ftp/




File Format Format in which data is written.The options are as follows:l CSV: Data is written in CSV

format, which is applicable tomigrating data tables to files.

l Binary: Files will be directlytransferred without resolution.CDM writes the files withoutchanging the original fileformat, which is applicable tothe migration of files to files.

If data is migrated between file-related data sources, such as FTP,SFTP, NAS, HDFS, and OBS, thevalue of File Format must thesame as the source file format.

CSV

Duplicate FileProcessing Method

Files with the same name and sizeare identified as duplicate files. Ifthere are duplicate files during datawriting, the following methods areavailable:l Replacel Skipl Stop job

Skip

Advancedattributes

Line Separator Lind feed character in the file. Bydefault, the system automaticallyidentifies \n, \r, and \r\n. Thisparameter is invalid when FileFormat is set to Binary.

\n

Field Delimiter Field delimiter in the file. Thisparameter is invalid when FileFormat is set to Binary.

,

File Size This parameter is displayed onlywhen the migration source is adatabase. Files are partitioned asmultiple files by size so that theycan be exported in proper size. Theunit is MB.

1024

Encoding Type Encoding type, for example,UTF_8 or GBK. This parameter isinvalid when File Format is set toBinary.

GBK




Use QuoteCharacter

This parameter is displayed onlywhen File Format is CSV. It isused when database tables aremigrated to file systems.If you set this parameter to Yes anda field in the source data tablecontains a field delimiter or lineseparator, CDM uses doublequotation marks (") as the quotecharacter to quote the field contentas a whole to prevent a fielddelimiter from dividing a field intotwo fields, or a line separator fromdividing a field into different lines.For example, if the hello,worldfield in the database is quoted, itwill be exported to the CSV file asa whole.

No

Write to TemporaryFile

This parameter is displayed onlywhen the migration source is a filesystem (OBS/FTP/SFTP/NAS/HDFS), the migration destination isFTP/SFP/NAS, and File Format isBinary.The binary file is written to a .tmpfile first. After the migration issuccessful, run the rename or movecommand at the migrationdestination to restore the file.

No

Generate MD5Hash Value

This parameter is displayed onlywhen the migration source is a filesystem (OBS/FTP/SFTP/NAS/HDFS), the migration destination isFTP/SFP/NAS, and File Format isBinary.An MD5 hash value is generatedfor each transferred file, and thevalue is recorded in a new .md5file. You can specify the directorywhere the MD5 value is generated.

No

Directory of MD5Hash Value

Directory for storing MD5 values /md5




Job Success MarkerFile

Whether to generate a marker filewith a custom name in thedestination directory after a job isexecuted successfully. If you do notspecify a file name, this function isdisabled by default.

finish.txt

5.3.6 To a Relational DatabaseWhen the destination link of a job is the Link to Relational Databases, that is, when data isimported to the following data sources, configure the destination job parameters based onTable 5-23.

l Data Warehouse Servicel RDS for MySQLl RDS for SQL Serverl RDS for PostgreSQLl DDMl FusionInsight LibrAl MySQL



Schema/Tablespace Name of the database to which data is tobe written. The schema can beautomatically created. Click the icon nextto the text box to select a schema ortablespace.

SCHEMA_EXAMPLE




Auto Table Creation This parameter is displayed only whenboth the migration source and destinationare relational databases. The options areas follows:l Non-auto creation: CDM will not

automatically create a table.l Auto creation: If the destination

database does not contain the tablespecified by Table Name, CDM willautomatically create the table. If thetable specified by Table Namealready exists, no table is created anddata is written to the existing table.

l Deletion before creation: CDMdeletes the table specified by TableName, and then creates the tableagain.

Non-auto creation

Table Name Name of the table to which data is to bewritten. Click the icon next to the textbox. The dialog box for selecting thetable is displayed.This parameter can be configured as amacro variable of date and time and apath name can contain multiple macrovariables. When the macro variable ofdate and time works with a scheduledjob, the incremental data can besynchronized periodically. For details,see Incremental Synchronization Usingthe Macro Variables of Date and Time.

TABLE_EXAMPLE

Compress Data Whether to compress data when data isimported to DWS and Auto creation isselected

No




Storage Mode When data is imported to DWS and AutoCreation is selected, you can specify thedata storage mode:l Row-based: Row-based storage. It is

applicable to point queries (index-based simple queries with fewerreturn records), or the scenario thatrequires a large number of addition,deletion, and modification operations.

l Column-based: Column-basedstorage. It is applicable to statisticalanalysis queries (group and joinscenarios) or ad hoc queries (querycondition columns and row storeindexes are uncertain).

Row-based

Clear Data Before Import This parameter is not displayed if AutoTable Creation is set to Deletion beforecreation.l Yes: CDM will delete data in the

tables who share the same names withthe tables in the source database.

l No: Data is appended to the existingtables.

Yes

Import to Staging Table If you set this parameter to Yes, thetransaction mode is enabled. CDMautomatically creates a temporary tableand imports data to the temporary table.After the data is imported successfully, itis migrated to the destination table intransaction mode. If the import fails, thedestination table is rolled back to thestate before the job starts. For details, seeMigration in Transaction Mode.The default value is No, indicating thatCDM directly imports the data to thedestination table. In this case, if the jobfails to be executed, the data that hasbeen imported to the destination tablewill not be rolled back automatically.NOTE

If you set Clear Data Before Import to Yes,CDM does not roll back the deleted data evenin transaction mode.

No




Extend Field Length When Auto creation is selected, thelength of the character fields can beextended to three times the originallength and then written to the destinationtable. If the encoding types of the sourceand destination databases are different,but the character fields in the source anddestination tables are the same, errorsmay occur during data migration due tocharacter length difference.When a character field containingChinese characters is imported to DWS,the length of the character field must beautomatically increased by three times.If a job fails to be executed and an errormessage similar to value too long fortype character varying exists in the logwhen you import Chinese characters toDWS, you can enable this function tosolve the problem.NOTE

When this function is enabled, some fieldsconsume three times the storage space of theuser.

No

5.3.7 To DDSWhen the destination link of a job is the Link to MongoDB/DDS, that is, when data isimported to DDS, configure the destination job parameters based on Table 5-24.



Database Name Database to which data is to be imported mongodb

Collection Name Collection of data to be imported, which issimilar to the table name of a relationaldatabase. Click the icon next to the text box togo to the page for selecting the table or directlyenter a table name.If the desired table is not displayed, checkwhether the table exists or whether the loginaccount has the permission to query metadata.

COLLECTION_NAME



5.3.8 To DCSWhen the destination link of a job is the Link to Redis/DCS, that is, when data is imported toDCS, configure the destination job parameters based on Table 5-25.



Redis Key Prefix Key prefix, which is similar to the tablename of a relational database

TABLENAME

Value Storage Type The options are as follows:l String: without column name, such as

value1,value2l Hash: with column name, such as

column1=value1,column2=value2

String

Key Delimiter Character used to separate table names andcolumn names of a relational database

_

Value Delimiter Character used to separate columns whenthe storage type is string

;

5.3.9 To Elasticsearch/Cloud Search ServiceWhen the destination link of a job is the Link to Elasticsearch, that is, when data is importedto Elasticsearch or Cloud Search Service, configure the destination job parameters based onTable 5-26.



Index Elasticsearch index, which is similar to thename of a relational database. CDMsupports automatic creation of indexes andfield types. The index and field type namescan contain only lowercase letters.

index

Type Elasticsearch type, which is similar to thetable name of a relational database. Thetype name can contain only lowercaseletters.

type

Pipeline ID Pipeline used to convert the data formatafter data is transferred to Elasticsearch.Pipeline IDs are ready for use after beingcreated in Kibana.

my_pipeline_id



5.3.10 To DLIWhen the destination link of a job is the Link to DLI, that is, when data is imported to DLI,configure the destination job parameters based on Table 5-27.



Resource Queue Resource queue to which the destinationtable belongs

cdm

Database Name Name of the database to which data is to bewritten

dli

Table Name Name of the table to which data is to bewritten

car_detail


Whether to clear data in the destinationtable before data import

No

5.4 Scheduling Job ExecutionCDM supports scheduled execution of table/file migration jobs by minute, hour, day, week,and month. This section describes how to configure scheduled job parameters.

Scheduling Job Execution by Minute

CDM supports job execution every several minutes. See Figure 5-7.

l Start Time: indicates the time when the scheduled configuration takes effect, or the firsttime when the job is automatically executed.

l Cycle (minutes): indicates the interval when a job is executed starting from the starttime.

l End Time: This parameter is optional. If it is not set, the scheduled job keeps beingautomatically executed. If it is set, the scheduled job will be automatically stopped at theend time.

Figure 5-7 Scheduling job execution by minute



Figure 5-7 shows that the job will be automatically executed at 15:30:30 on November 29,2018 for the first time at a cycle of 30 minutes, and will be automatically stopped at 15:29:00on November 30, 2018.

Scheduling Job Execution by HourCDM supports job execution every several hours. See Figure 5-8.l Cycle (hours): indicates the interval when a job is automatically executed.l Trigger Time (minute): indicates the exact time in each hour when a scheduled task is

triggered. The value ranges from 0 to 59. You can set a maximum of 60 values and usecommas (,) to separate these values. However, the values must be unique.If the trigger time is not within the validity period, the system selects a trigger timeclosest to the validity period for the scheduled job to be automatically executed at thefirst time. The following gives an example:– Start Time: 1:20:00– Cycle (hours): 3– Trigger Time (minute): 10Figure 5-8 shows that the first automatic execution time is 2:10:00, and the secondautomatic execution time is 5:10:00.

Figure 5-8 Trigger time beyond the validity period

l Validity Period: includes Start Time and End Time.– Start Time: indicates the time when the scheduled configuration takes effect.– End Time: This parameter is optional, which indicates the time when the scheduled

job is automatically stopped. If this parameter is not set, the scheduled job keepsbeing automatically executed.

Figure 5-9 Scheduling job execution by hour



Figure 5-9 shows that the scheduled configuration will take effect at 15:30:00 on November30, 2018. The job is automatically executed for the first time upon the scheduledconfiguration takes effect, at 15:50:00 for the second time, and at 17:10:00 for the third time.The job is triggered for three times every 2 hours and the configuration is always valid.

Scheduling Job Execution by DayCDM supports job execution every several days. See Figure 5-10.l Cycle (days): indicates the interval when a job is executed starting from the start time.l Validity Period: includes Start Time and End Time.

– Start Time: indicates the time when the scheduled configuration takes effect, or thefirst time when the job is automatically executed.

– End Time: This parameter is optional, which indicates the time when the scheduledjob is automatically stopped. If this parameter is not set, the scheduled job keepsbeing automatically executed.

Figure 5-10 Scheduling job execution by day

Figure 5-10 shows that the scheduled job will be automatically executed at 00:20:00 onDecember 1, 2018, and is executed once every three days. The configuration is always valid.

Scheduling Job Execution by WeekCDM supports job execution every several weeks, as shown in Figure 5-11.l Cycle (weeks): indicates the interval when a scheduled job is executed starting from the

start time.l Trigger Time (day): You can specify the day of each week when the job is automatically

executed. One or more days can be selected at a time.l Validity Period: includes Start Time and End Time.

– Start Time: indicates the time when the scheduled configuration takes effect.– End Time: This parameter is optional, which indicates the time when the scheduled




Figure 5-11 Scheduling job execution by week

Figure 5-11 shows that the job will be automatically executed at 00:20:00 every Tuesday,Saturday, and Sunday every two weeks starting from 00:20:00 on December 1, 2018, and thejob will be automatically stopped at 00:00:00 on June 1, 2019.

Scheduling Job Execution by MonthCDM supports job execution every several months, as shown in Figure 5-12.l Cycle (months): indicates the interval when a scheduled job is executed starting from

the start time.l Trigger Time (day): indicates the day of each month when the job is executed. The

value ranges from 1 to 31. You can set multiple values and use commas (,) to separatethese values. However, the values must be unique.

l Validity Period: includes Start Time and End Time.– Start Time: indicates the time when the scheduled configuration takes effect. The

automatic execution time is accurate to hour, minute, and second.– End Time: This parameter is optional, which indicates the time when the scheduled


Figure 5-12 Scheduling job execution by month

Figure 5-12 shows that the job will be automatically executed at 00:00:00 on every fifth andtwenty-fifth day of each month starting from 00:00:00 on December 1, 2018. Theconfiguration is always valid.



5.5 Managing a Single Job

Scenario

This section describes how to manage a single CDM table/file migration job. The followingoperations are involved:l Modify the job parameters.l Run the job.l Stop the job.l View historical records.l View the job JSON.l Edit the job JSON.l Delete the job.l Query the job statistics.l Stop incremental migration.l Continue incremental migration.

Procedure



Step 3 Click Table/File Migration. The job list is displayed. You can perform the followingoperations on a single job:l Modify the job parameters: Click Edit in the Operation column to modify the job

parameters.l Run the job: Click Run in the Operation column to manually start the job.l View the historical records: Click Historical Record in the Operation column. On the

Historical Record page that is displayed, view the job's historical execution records andread/write statistics. Click Log to view the log information about the job.

l Delete the job: Choose More > Delete in the Operation column to delete the job.l Stop the job: Choose More > Stop in the Operation column to stop the job.l View the job JSON: Choose More > View Job JSON in the Operation column to view

the job JSON.l Edit the job JSON: Choose More > Edit Job JSON in the Operation column to edit the

job JSON files, which is similar to modify the job parameters.l Query the job statistics: Choose More > Query Job Statistics in the Operation column

to open the preview window of a configured database job. A maximum of 1,000 datarecords can be previewed. By comparing the number of data records of the migrationsource and destination, you can check whether the migration is successful and whetherdata is lost.

l Continue incremental migration: More > Continue Incremental Migration in theOperation column. If the job is to migrate a single table from MySQL to DWS and




Migrate Incremental Data is set to Yes, you can continue incremental migration byperforming the proceeding operations.

l Stop incremental migration: More > Stop Incremental Migration in the Operationcolumn. If the job is to migrate a single table from MySQL to DWS and MigrateIncremental Data is set to Yes, you can stop incremental migration by performing theproceeding operations.

Step 4 After the modification, click Save or Save and Run.

----End

5.6 Batch Managing Jobs

Scenario

This section describes how to batch manage CDM table/file migration jobs. The followingoperations are involved:l Batch run jobs.l Batch delete jobs.l Batch export jobs.l Batch import jobs.

You can batch export and import jobs in the following scenarios:l Job migration between CDM clusters: You can migrate jobs from a cluster of the earlier

version to a new version.l Job backup: You can stop or delete CDM clusters to reduce costs. In this case, you can

batch export the job scripts and save them, and create a cluster and import the job scriptsif necessary.

l Batch job creation: You can manually create a job and export the job configuration file inJSON format. Copy the content in the JSON file to the same file or new files, and thenimport the file/files to CDM to batch create jobs.

Procedure



Step 3 Click Table/File Migration. The job list is displayed. You can perform the following batchoperations:l Batch run jobs.

After selecting one or more jobs, click Run to batch start these jobs.l Batch delete jobs.

After selecting one or more jobs, click Delete to batch delete these jobs.l Batch export jobs.

Click Export All to export all jobs in JSON format. These files can be used as backupsor imported to another cluster.




Currently, you cannot select specific jobs to export but can only export all jobs at a time.For security purposes, the link passwords are not exported when CDM export the jobsand are replaced with Add password here.

l Batch import jobs.Click Import and select the import format (text file or JSON).– By JSON string: Job files to be imported must be in JSON format. If the job files

to be imported are exported from CDM, edit the JSON files before importing themto CDM. Replace Add password here with the correct link passwords.

– By text file: This mode can be used when the local JSON files cannot be uploadedproperly. Paste the JSON strings of the jobs to the text box.

----End



6 Typical Scenarios

6.1 Migrating Data from DDS to DWS

Scenario

CDM allows you to migrate data from DDS to other data sources. This section describes howto use CDM to migrate data from DDS to DWS. The procedure is as follows:

1. Creating a CDM Cluster and Binding an EIP to the Cluster2. Creating a DDS Link3. Creating a DWS Link4. Creating a Migration Job

Prerequisitesl You have purchased DWS and DDS.l You have obtained the IP address, port number, database name, username, and password

for connecting to the DWS and DDS databases. In addition, you must have the read,write, and delete permissions on the DDS and DWS databases.

Creating a CDM Cluster and Binding an EIP to the Cluster

Step 1 Log in to the CDM management console and create a CDM cluster. For details about how tocreate a cluster, see Creating a Cluster. The key configurations are as follows:l The flavor of the CDM cluster is selected based on the amount of data to be migrated.

Generally, select cdm.medium to meet the requirements of most migration scenarios.l If DDS and DWS belong to the same VPC, the newly created CDM cluster also needs to

be placed in the VPC without binding an EIP. The CDM cluster's subnet and securitygroup can be the same as those of the DDS or DWS cluster. You can also configure thesecurity group rule to enable the CDM cluster to access the cluster of another service(DWS or DDS).

l If DDS and DWS are not in the same VPC, the newly created CDM cluster needs to bein the same VPC as DDS and an EIP must be bound for the CDM cluster to access theDWS cluster.

Cloud Data MigrationUser Guide 6 Typical Scenarios



Step 2 After the CDM cluster is created, on the Cluster Management page, click Bind Elastic IP inthe Operation column to bind an EIP to the cluster. The CDM cluster uses the EIP to accessDWS. If DDS and DWS are in the same VPC, do not bind an EIP to the CDM cluster.

----End

Creating a DDS Link



Step 2 To create a DDS link, select Document Database Service, and click Next to configure thelink parameters based on Table 6-1.

Table 6-1 DDS link parameters



mongo_link

MongoDB ServerList

Address list of the DDS cluster. Theformat is IP address or domain name ofthe database server:port number.Separate multiple server lists bysemicolons (;).

192.168.0.1:7300;192.168.0.2:7301

Database Name Name of the DDS database to beconnected

DB_mongodb

Username Username for logging in to the DDSdatabase

cdm

Password Password for logging in to the DDSdatabase

-




----End

Creating a DWS Link

Step 1 On the Link Management tab page, click Create Link and select Data Warehouse Serviceto create a DWS link.

Step 2 Click Next. The page for configuring the DWS link parameters is displayed. Configure themandatory parameters according to Table 6-2 and retain the default values of the optionalparameters.

Table 6-2 DWS link parameters


Name Unique link name dwslink

Database Server IP address or domain name of the DWSdatabase server

192.168.0.3

Port DWS database port 8000

Database Name Name of the DWS database db_demo

Username User who has the read, write, and deletepermissions on the DWS database

dbadmin





Import Mode When creating a DWS link, select thedata import mode.l Copy: Migrate the source data to the

DWS management node and thencopy the data to DataNodes. Toaccess DWS through the Internet,select Copy.

l GDS: DataNodes of DWSconcurrently request data from theGDS component of CDM and thenwrite data to DWS. The GDS modecannot be used for data export fromDWS.

Theoretically, the GDS mode is moreefficient than the Copy mode. However,when the GDS mode is used, thefollowing configurations are required:1. Configure DWS to allow users of the

DWS link to create and deleteforeign tables.

2. Configure the security group wherethe CDM cluster resides to allow theDWS DataNodes to access port25000 of the internal IP address ofthe CDM cluster.

Copy

Step 3 Click Save. The link is successfully created.

----End

Creating a Migration Job

Step 1 Choose Table/File Migration > Create Job to create a data migration job. Figure 6-2illustrates how to create a migration job.




Step 2 Configure the required job information:l Job Name: Enter a unique job name.l Source Job Configuration

– Source Link Name: Select the mongo_link link created in Creating a DDS Link.– Database Name: Select the database whose data is to be migrated.– Collection Name: Enter the name of the MongoDB collection on DDS, which is

similar to the table name in a relational database.l Destination Job Configuration

– Destination Link Name: Select the dwslink link created in Creating a DWSLink.

– Schema/Tablespace: Select the DWS database to which data is to be written.– Table Name: Name of the table to which data is to be written. You can manually

enter a table name that does not exist. CDM automatically creates the table onDWS.

– Clear Data Before Import: Choose whether to clear data in the destination tablebefore data import.

Step 3 Click Next. The Map Field page is displayed. See Figure 6-3. CDM automatically mapstable fields at the migration source and destination. Check whether the field mapping iscorrect.l If the field mapping is incorrect, click the row where the field is located and drag the

field to adjust the mapping.l You need to manually select the distribution columns of DWS. You are advised to select

the distribution columns according to the following principles:

a. Use the primary key as the distribution column.b. If multiple data segments are combined as primary keys, specify all primary keys as

the distribution column.



c. In the scenario where no primary key is available, if no distribution column isselected, DWS uses the first column as the distribution column by default. As aresult, data skew risks exist.

l If you need to convert the content of the source fields, perform the operations describedin Field Conversion During Migration. In this example, the field conversion is notrequired.



In this step, you can configure the following optional functions:l Retry upon Failure: If the job fails to be executed, you can determine whether to

automatically retry. Retain the default value Never.l Schedule Execution: To configure scheduled jobs, see Scheduling Job Execution.

Retain the default value No.l Concurrent Extractors: Enter the number of extractors to be concurrently executed.

Retain the default value 1.l Write Dirty Data: Specify this parameter if data that fails to be processed or filtered out

during job execution needs to be written to OBS for future viewing. Before writing dirtydata, create an OBS link. Retain the default value No so that dirty data is not recorded.


Step 5 Click Save and Run. The Job Management page is displayed, on which you can view thejob execution progress and result.

Step 6 After the job is successfully executed, in the Operation column of the job, click HistoricalRecord to view the job's historical execution records and read/write statistics.

On the Historical Record page, click Log to view the job log.

----End

6.2 Periodically Backing Up FTP/SFTP Files to HUAWEICLOUD OBS

ScenarioCDM can automatically upload new files to OBS periodically. You do not need to compilecode or manually upload the files frequently. You can also use the massive storage capabilitiesof OBS on HUAWEI CLOUD to back up files.



This section describes how to periodically back up FTP files to OBS with CDM.

For example, the to_obs_test directory on the FTP server contains one subdirectoryanother_dir and two files file1 and file2. file2 is in the another_dir directory. Figure 6-4shows the files. Configure a scheduled job of CDM to transfer these files to OBS and addfile3 and file4 to the directory to verify that CDM can periodically transfer new files to OBS.

Figure 6-4 Files on the FTP server

Prerequisitesl You have sufficient EIP quota.l You have created an OBS bucket and obtained the access key (AK and SK).l You have obtained the IP address, username, and password of the FTP server.l If the FTP server is in the on-premises environment, ensure that the FTP server can

access HUAWEI CLOUD through the public network, or the VPN or Direct Connectbetween the on-premises data center and HUAWEI CLOUD has been established.


Step 1 Log in to the CDM management console and click Buy CDM to create a CDM cluster. Thekey configurations are as follows:l Select the cdm.medium instance, which is applicable to most migration scenarios.l If the cluster is used only to migrate data from third-party data sources to OBS, there is

no requirements on the VPC, subnet, and security group of the CDM cluster. You canspecify them based on your needs.

Step 2 After the CDM cluster is created, on the Cluster Management page, click Bind Elastic IP inthe Operation column to bind an EIP to the cluster. The CDM cluster uses the EIP to accessthe on-premises FTP server.

----End

Creating an OBS Link






Step 2 Select Object Storage Service and click Next to configure the OBS link parameters. SeeFigure 6-6.l Name: Enter a custom link name, for example, obslink.l OBS Server and Port: Enter the actual OBS address information.l AK and SK: Enter the AK and SK used for logging in to OBS.



Figure 6-6 Creating an OBS link


----End

Creating an FTP Link

Step 1 On the Link Management tab page, click Create Link. On the page that is displayed, selectFTP, click Next, and configure the FTP link parameters.l Name: Enter a custom link name, for example, ftplink.l Host Name/IP Address and Port: Enter the address information about the FTP server.l Username and Password: Enter the username and password used for logging in to the

FTP server.


----End

Creating a Scheduled Migration Job

Step 1 Choose Table/File Migration > Create Job to create a data migration job. See Figure 6-7.




l Job Name: Enter a custom job name.l Source Link Name: Select the ftplink link created in Creating an FTP Link.

– Source Directory/File: Select the path where to_obs_test is located.– File Format: Select Binary to transfer files without parsing the content. This

parameter must be consistent on both the migration source and destination.l Destination Link Name: Select the obslink link created in Creating an OBS Link.

– Bucket Name: Select the OBS bucket for storing FTP files.– Write Directory: Select an existing directory or manually enter one. If the entered

directory does not exist, CDM automatically creates one, for example, /to/ftp2obs/.– File Format: Select Binary. The value must be the same as that on the migration

source.– Duplicate File Processing Method: Select Skip to avoid transferring duplicate

files.

Step 2 Click Next and configure the scheduled task. In this example, the scheduled task is executedevery 10 minutes. See Figure 6-8. Retain the default values of other parameters.



Figure 6-8 Scheduling job execution

Step 3 Click Save and Run.

----End

Verifying Backup

Step 1 After the job is executed successfully, log in to the OBS client. You can see that thecorresponding files exist on OBS. Figure 6-9 shows the files on OBS.

Figure 6-9 Files on the OBS client



Step 2 In the FTP server directories, add files file3 and file4. file3 and file1 are in the same directory,and file2 and file4 are in the same directory. See Figure 6-10.

Figure 6-10 New files on the FTP server

Step 3 Wait 10 minutes and CDM automatically triggers the scheduled job. Then you can view thenew files file3 and file4 after logging in to OBS. Figure 6-11 shows the new files on OBS.

Figure 6-11 New files on OBS

Step 4 On the Job Management page, click Historical Record in the Operation column to viewthe job's historical execution records and read/write statistics.

Step 5 Click Log to view the job log.

----End



6.3 Migrating Data from OSS to OBS

ScenarioCDM allows you to directly migrate object storage data from a third-party cloud to HUAWEICLOUD OBS without forwarding or writing code.

This section describes how to use CDM to migrate data from Alibaba Cloud OSS toHUAWEI CLOUD OBS. The procedure is as follows:

1. Creating a CDM Cluster and Binding an EIP to the Cluster2. Creating an OBS Link3. Creating an OSS Link4. Creating a Migration Job

Preparing Datal Domain name for accessing OSS, for example, oss-cn-hangzhou.aliyuncs.coml AK, temporary credential, or security token for accessing OSSl Domain name, port number, AK, and SK for accessing OBS


Step 1 Log in to the CDM management console and create a CDM cluster. For details about how tocreate a cluster, see Creating a Cluster. The key configurations are as follows:l Select the cdm.medium instance, which is applicable to most migration scenarios.l If the cluster is used only to migrate data from third-party data sources to OBS, there is

no special requirements on the VPC, subnet, and security group of the CDM cluster. Youcan specify them based on your needs.

Step 2 After the CDM cluster is created, on the Cluster Management page, click Bind Elastic IP inthe Operation column to bind an EIP to the cluster. The CDM cluster accesses Alibaba CloudOSS through the public network.

Because data is imported to HUAWEI CLOUD, 5 Mbit/s bandwidth for the EIP is enough.

----End







Step 2 Select Object Storage Service and click Next to configure the OBS link parameters. SeeFigure 6-13.l Name: Enter a custom link name, for example, obslink.l OBS Server and Port: Enter the actual OBS address information.l AK and SK: Enter the AK and SK used for logging in to OBS.





----End

Creating an OSS Link

Step 1 On the Link Management tab page, click Create Link. On the page that is displayed, selectAlibaba Cloud OSS, click Next, and configure the required link parameters. See Figure6-14.l Name: Enter a custom link name.l OSS Endpoint: Enter the access domain name of the data to be migrated.l Authentication Method: Select an authentication method based on your needs, for

example, Access key.l AK and SK: Enter the AK and SK used for logging in to OSS.



Figure 6-14 Creating an OSS link


----End


Step 1 Choose Table/File Migration > Create Job to create a job for migrating data from OSS toOBS. See Figure 6-15.




l Job Name: Enter a custom job name.l Source Job Configuration

– Source Link Name: Select the osslink link created in Creating an OSS Link.– Bucket Name: Select the bucket from which the data is to be migrated.– Source Directory/File: Set this parameter to the path of the data to be migrated.

You can migrate all files in the bucket.– File Format: Select Binary, which delivers the optimal performance and rate of

transferring files to files.– Retain the default values of the optional parameters in Show Advanced Attributes.

For details, see From OBS/OSS.l Destination Job Configuration

– Destination Link Name: Select the obslink link created in Creating an OBSLink.

– Bucket Name: Select the bucket to which data is to be written.– Write Directory: Select the path for storing data.– File Format: Select Binary. The value must be the same as that on the migration

source.– Retain the default values of other optional parameters. For details, see To OBS.




Retain the default value No.









----End

6.4 Migrating Data from On-premises Redis to DCS

Scenario

CDM can migrate data from the on-premises Redis database or third-party Redis service toDCS on HUAWEI CLOUD without programming. The procedure is as follows:

1. Creating a CDM Cluster and Binding an EIP to the Cluster2. Creating the Redis and DCS Links3. Creating a Migration Job

Prerequisitesl You have sufficient EIP quota.

l You have subscribed to DCS and obtained the IP address, port number, and password ofthe DCS database.

l The on-premises Redis database can be accessed through the public network. You canconfigure port mapping or port forwarding to enable public network access. For details,see How Do I Connect On-premises Intranet or Third-Party Private Network toCDM.

l You have obtained the IP address and password of the Redis server.


Step 1 Log in to the CDM management console and create a CDM cluster. For details about how tocreate a cluster, see Creating a Cluster. The key configurations are as follows:

l The flavor of the CDM cluster is selected based on the amount of data to be migrated.Generally, select cdm.medium to meet the requirements of most migration scenarios.

l The CDM and DCS clusters must be in the same VPC. In addition, it is recommendedthat the CDM cluster be in the same subnet and security group as the DCS cluster.

l If the subnets and security groups are inconsistent, configure a security group rule toenable the CDM cluster to access the DCS cluster.




Step 2 After the CDM cluster is created, click Bind Elastic IP on the Cluster Management page tobind an EIP to the cluster. The CDM cluster uses the EIP to access the on-premises Redis datasource.

----End

Creating the Redis and DCS Links



Step 2 Select Redis and click Next. On the page that is displayed, configure the Redis linkparameters. See Figure 6-17.



Figure 6-17 Creating a Redis link

l Name: Enter a custom link name, for example, redis_link.l Redis Deployment Method: Select the value based on the actual deployment method of

the on-premises Redis database.– If it is deployed in single-node mode, select Single.– If it is deployed in cluster mode, select Cluster.

l Redis Server List: Set this parameter to the server address of the on-premises Redisdatabase. Separate multiple server lists by semicolons (;).

l Password and Redis Database Index: Enter the password used for logging in to the on-premises Redis database and the index of the data to be exported.


Step 4 On the Link Management tab page, click Create Link to create a DCS link. The procedurefor creating a DCS link and the link parameters are the same as those of the Redis link.

Step 5 Select Distributed Cache Service, click Next, and configure the DCS link parameters.l Name: Enter a custom link name, for example, dcs_link.l Redis Deployment Method: Select the value based on the DCS cluster deployment

mode.l Redis Server List: Set this parameter to the server address lists of the DCS database.

Separate multiple server lists by semicolons (;).l Password and Redis Database Index: Enter the password used for logging in to the

DCS database and the index of the data to be exported.




----End


Step 1 Choose Table/File Migration > Create Job to create a data migration job. See Figure 6-18.


l Job Name: Enter a unique name, for example, redis2dcs.l Source Job Configuration

– Source Link Name: Select the Redis link created in Creating the Redis and DCSLinks.

– Redis Key Prefix: Select a key prefix from which data is to be exported.– Value Storage Type: Select a value based on your needs. The following uses Hash

on both the migration source and destination as an example.– Retain the default values of the optional parameters in Show Advanced Attributes.

For details, see From Redis.l Destination Job Configuration

– Destination Link Name: Select the DCS link created in Creating the Redis andDCS Links.

– Redis Key Prefix: Select a key prefix to which data is to be imported.– Value Storage Type: Select Hash, which is the same as the migration source.– Retain the default values of the optional parameters in Show Advanced Attributes.

For details, see To DCS.

Step 2 After the basic job parameters are configured, click Next to enter the page for configuringfield mapping. See Figure 6-19.

For the hash type, you can click to copy the fields on the migration source and then selectthe field that is used as the primary key.



Figure 6-19 Configuring field mapping

Step 3 Click Next. On the Configure Task page that is displayed, configure Schedule Execution asrequired. See Figure 6-20.

If Schedule Execution is enabled, CDM periodically synchronizes data. If data with duplicateprimary keys exists, CDM automatically overwrites the existing primary key.

Figure 6-20 Scheduling job execution






----End

6.5 Migrating Data from Oracle to Cloud Search Service

Scenario

Cloud Search Service provides users with structured and unstructured data search, statistics,and report capabilities. This section describes how to use CDM to migrate data from theOracle database to Cloud Search Service. The procedure is as follows:

1. Creating a CDM Cluster and Binding an EIP to the Cluster2. Creating a Cloud Search Service Link3. Creating an Oracle Link4. Creating a Migration Job

Prerequisitesl You have sufficient EIP quota.l You have subscribed to Cloud Search Service and obtained the IP address and port

number of the Cloud Search Service cluster.l You have obtained the IP address, name, username, and password of the Oracle database.

If the Oracle server is deployed on an on-premises data center or a third-party cloud,ensure that an IP address that can be accessed from the public network has beenconfigured for the Oracle database, or the VPN or Direct Connect between the on-premises data center and HUAWEI CLOUD has been established. To enable publicnetwork access, see How Do I Connect On-premises Intranet or Third-Party PrivateNetwork to CDM.



Generally, select cdm.medium to meet the requirements of most migration scenarios.l The CDM and Cloud Search Service clusters must be in the same VPC. In addition, it is

recommended that the CDM cluster be in the same subnet and security group as theCloud Search Service cluster.

l If the same subnet and security group cannot be used for security purposes, ensure that asecurity group rule has been configured to allow the CDM cluster to access the CloudSearch Service cluster.

Step 2 After the CDM cluster is created, on the Cluster Management page, click Bind Elastic IP inthe Operation column to bind an EIP to the cluster. The CDM cluster uses the EIP to accessthe Oracle data source.

----End




Creating a Cloud Search Service Link



Step 2 Select Cloud Search Service and click Next. On the page that is displayed, configure theCloud Search Service link parameters. See Figure 6-22.l Name: Enter a custom link name, for example, csslink.l Elasticsearch Server and Port: Enter the address and port number of the Cloud Search

Service cluster.l Username and Password: Enter the username and password used for logging in to the

Cloud Search Service cluster. The user must have the read and write permissions on thedatabase.



Figure 6-22 Creating a Cloud Search Service link


----End

Creating an Oracle Link

Step 1 On the Link Management tab page, click Create Link. On the page that is displayed, selectOracle, click Next, and configure the Oracle link parameters.l Name: Enter a custom link name, for example, oracle_link.l Database Server and Port: Enter the address and port number of the Oracle server.l Database Name: Enter the name of the Oracle database whose data is to be exported.l Username and Password: Enter the username and password used for logging in to the

Oracle database. The user must have the permission to read the Oracle metadata.


----End


Step 1 Choose Table/File Migration > Create Job to create a job for exporting data from the Oracledatabase to Cloud Search Service.




l Job Name: Enter a unique name.l Source Job Configuration

– Source Link Name: Select the oracle_link link created in Creating an OracleLink.

– Schema/Tablespace: Enter the name of the database whose data is to be migrated.– Table Name: Enter the name of the table to be migrated.– Retain the default values of the optional parameters in Show Advanced Attributes.

For details, see From a Relational Database.l Destination Job Configuration

– Destination Link Name: Select the csslink link created in Creating a CloudSearch Service Link.

– Index: Select the Elasticsearch index of the data to be written. You can also enter anew index. CDM automatically creates the index on Cloud Search Service.

– Type: Select the Elasticsearch type of the data to be written. You can enter a newtype. CDM automatically creates a type at the migration destination.

– Retain the default values of the optional parameters in Show Advanced Attributes.For details, see To Elasticsearch/Cloud Search Service.

Step 2 Click Next. The Map Field page is displayed. CDM automatically matches the source anddestination fields. See Figure 6-24.l If the field mapping is incorrect, you can drag the fields to adjust the mapping.l If the type is automatically created at the migration destination, you need to configure

the type and name of each field.l CDM supports field conversion during the migration. For details, see Field Conversion

During Migration.














----End

6.6 Migrating Data from OBS to Cloud Search Service

Scenario

CDM supports data migration between services on HUAWEI CLOUD. This section describeshow to use CDM to migrate data from OBS to Cloud Search Service. The procedure is asfollows:

1. Creating a CDM Cluster2. Creating a Cloud Search Service Link3. Creating an OBS Link4. Creating a Migration Job

Prerequisitesl You have obtained the domain name, port number, AK, and SK for accessing OBS.

l You have subscribed to Cloud Search Service and obtained the IP address and portnumber of the Cloud Search Service cluster.

Creating a CDM Cluster

Log in to the CDM management console and create a CDM cluster. For details about how tocreate a cluster, see Creating a Cluster. The key configurations are as follows:





l The CDM and Cloud Search Service clusters must be in the same VPC. In addition, it isrecommended that the CDM cluster be in the same subnet and security group as theCloud Search Service cluster.












----End


Step 1 On the Link Management tab page, click Create Link. On the page that is displayed, selectHUAWEI CLOUD OBS, click Next, and configure the required link parameters. See Figure6-27.l Name: Enter a custom link name, for example, obslink.l OBS Server and Port: Enter the actual OBS address information.l AK and SK: Enter the AK and SK used for logging in to OBS.





----End


Step 1 Choose Table/File Migration > Create Job to create a job for exporting data from OBS toCloud Search Service.




l Job Name: Enter a unique name.

l Source Job Configuration

– Source Link Name: Select the obslink link created in Creating an OBS Link.

– Bucket Name: Select the bucket from which the data is to be migrated.

– Source Directory/File: Set this parameter to the path of the data to be migrated.You can migrate all directories and files in the bucket.

– File Format: Select CSV for migrating files to a data table.

– Retain the default values of the optional parameters in Show Advanced Attributes.For details, see From OBS/OSS.

l Destination Job Configuration

– Destination Link Name: Select the csslink link created in Creating a CloudSearch Service Link.

– Index: Select the Elasticsearch index of the data to be written. You can also enter anew index. CDM automatically creates the index on Cloud Search Service.

– Type: Select the Elasticsearch type of the data to be written. You can enter a newtype. CDM automatically creates a type at the migration destination.

– Retain the default values of the optional parameters in Show Advanced Attributes.For details, see To Elasticsearch/Cloud Search Service.

Step 2 Click Next. The Map Field page is displayed. CDM automatically matches the source anddestination fields. See Figure 6-29.


l If the type is automatically created at the migration destination, you need to configurethe type and name of each field.

l CDM supports field conversion during the migration. For details, see Field ConversionDuring Migration.














----End

6.7 Migrating Data from OBS to DLI

Scenario

DLI is a fully hosted big data query service provided by HUAWEI CLOUD. This sectiondescribes how to use CDM to migrate data from OBS to DLI. The procedure is as follows:

1. Creating a CDM Cluster2. Creating a DLI Link3. Creating an OBS Link4. Creating a Migration Job

Prerequisitesl You have subscribed to OBS and DLI.l You have created resource queues, databases, and tables on DLI.



Creating a CDM ClusterLog in to the CDM management console and perform operations as required.l If you already have a CDM cluster, click Job Management in the row of the cluster and

create links on the page that is displayed.l If you do not have a CDM cluster, click Buy CDM to create a cluster. For details about

how to create a cluster, see Creating a Cluster.In this scenario, if the CDM cluster is used only to migrate data from OBS to DLI anddoes not need to migrate data of other data sources, there is no special requirements onthe VPC, subnet, and security group of the CDM cluster. You can specify them based onyour needs. CDM accesses DLI and OBS through the intranet. The flavor of the CDMcluster is selected based on the amount of data to be migrated. Generally, selectcdm.medium to meet the requirements of most migration scenarios.

Creating a DLI Link



Step 2 Select Data Lake Insight, click Next, and configure the DLI link parameters. See Figure6-31.l Name: Enter a custom link name, for example, dlilink.l AK and SK: Enter the AK and SK used for accessing the DLI database. To obtain the

AK and SK, hover the cursor on the username on the management console and chooseMy Credential > Access Keys.

l Project ID: Enter the ID of the project to which DLI belongs. Obtain the project ID onthe My Credential page.




Figure 6-31 Creating a DLI link


----End


Step 1 On the Link Management tab page, click Create Link. On the page that is displayed, selectHUAWEI CLOUD OBS, click Next, and configure the required link parameters. See Figure6-32.l Name: Enter a custom link name, for example, obslink.l OBS Server and Port: Enter the actual OBS address information.l AK and SK: Enter the AK and SK used for logging in to OBS.





----End


Step 1 Choose Table/File Migration > Create Job to create a job for migrating data from OBS toDLI. See Figure 6-33.




l Job Name: Enter a custom job name.

l Source Link Name: Select the obslink link created in Creating an OBS Link.– Bucket Name: Select the bucket from which the data is to be migrated.– Source Directory/File: Set this parameter to the path of the data to be migrated.– File Format: Select CSV or JSON for transferring files to a data table.– Retain the default values of the optional parameters in Show Advanced Attributes.

For details, see From OBS/OSS.

l Destination Link Name: Select the dlilink link created in Creating a DLI Link.– Resource Queue: Enter the resource queue to which the destination table belongs.– Database Name: Enter the name of the database to which data is to be written.– Table Name: Enter the name of the table to which data is to be written. CDM

cannot automatically create tables on DLI. The table must be created on DLI inadvance, and the field types and formats of the table must be consistent with thoseof the data to be migrated.

– Clear Before Importing Data: Choose whether to clear data in the destinationtable before data import. In this example, retain the default value.

Step 2 Click Next. The Map Field page is displayed. CDM automatically matches the source anddestination fields.


l CDM supports field conversion during the migration. For details, see Field ConversionDuring Migration.













----End

6.8 Migrating Data from the MySQL Database to the MRSHive Partition Table

MRS provides enterprise-level big data clusters on the cloud. It contains HDFS, Hive, andSpark components and is applicable to massive data analysis of enterprises.

Hive supports SQL to help users perform extraction, transformation, and loading (ETL)operations on large-scale data sets. Query on large-scale data sets takes a long time. In manyscenarios, you can create Hive partitions to reduce the total amount of data to be scanned eachtime. This significantly improves query performance.

Hive partitions are implemented by using the HDFS subdirectory function. Each subdirectorycontains the column names and values of each partition. If there are multiple partitions, manyHDFS subdirectories exist. It is not easy to load external data to each partition of the Hivetable without relying on tools. With CDM, you can easily load data of the external datasources (relational databases, object storage services, and file system services) to Hivepartition tables.

This section describes how to migrate data from the MySQL database to the MRS Hivepartition table.

ScenarioSuppose that there is a trip_data table in the MySQL database. The table stores cyclingrecords such as the start time, end time, start sites, end sites, and rider IDs. For details aboutthe fields in the trip_data table, see Figure 6-34.



https://www.huaweicloud.com/en-us/product/mrs.html

Figure 6-34 MySQL table fields

The following describes how to use CDM to import the trip_data table in the MySQLdatabase to the MRS Hive partition table. The procedure is as follows:

1. Creating a Hive Partition Table on MRS Hive

2. Creating a CDM Cluster and Binding an EIP to the Cluster

3. Creating a MySQL Link

4. Creating a Hive Link

5. Creating a Migration Job

Prerequisitesl You have subscribed to MRS.

l You have sufficient EIP quota.

l You have obtained the IP address, port number, database name, username, and passwordfor connecting to the MySQL database. In addition, the user must have the read andwrite permissions on the MySQL database.

Creating a Hive Partition Table on MRS Hive

On MRS Hive, run the following SQL statement to create a Hive partition table namedtrip_data with three new fields y, ym, and ymd used as partition fields. The SQL statementis as follows:create table trip_data(TripID int,Duration int,StartDate timestamp,StartStation varchar(64),StartTerminal int,EndDate timestamp,EndStation varchar(64),EndTerminal int,Bike int,SubscriberType varchar(32),ZipCodev varchar(10))partitioned by (y int,ym int,ymd int);

NOTE

The trip_data partition table has three partition fields: year, year and month, and year, month, and dateof the start time of a ride. For example, if the start time of a ride is 2018/5/11 9:40, the record is saved inthe trip_data/2018/201805/20180511 partition. When the records in the trip_data table aresummarized, only part of the data needs to be scanned, greatly improving the performance.





Generally, select cdm.medium to meet the requirements of most migration scenarios.l The CDM and MRS clusters must be in the same VPC, subnet, and security group.

Step 2 After the CDM cluster is created, on the Cluster Management page, click Bind Elastic IP inthe Operation column to bind an EIP to the cluster. The CDM cluster uses the EIP to accessMySQL.


----End


Step 1 On the Cluster Management page, click Job Management of the cluster and choose LinkManagement > Create Link to enter the page for selecting the connector. See Figure 6-36.












192.168.0.1







admin



NOTE


----End

Creating a Hive Link

Step 1 Click Create Link and select MRS Hive to create an MRS Hive link.

Step 2 Click Next and configure the MRS Hive link parameters. See Figure 6-38.

Figure 6-38 Creating a Hive link



Table 6-4 describes the parameters. You can configure the parameters according to the actualsituation.

Table 6-4 Hive link parameters



hivelink


127.0.0.1




security mode.

Simple


cdm


-


----End






NOTE

Set Clear Data Before Import to Yes, so that the data in the Hive table will be cleared before dataimport.

Step 2 After the parameters are configured, click Next. The Map Field tab page is displayed. SeeFigure 6-40.

Map the fields of the MySQL table and Hive table. The Hive table has three more fields y,ym, and ymd than the MySQL table, which are the Hive partition fields. Because the fields ofthe source table cannot be directly mapped to the destination table, you need to configure anexpression to extract data from the StartDate field in the source table.




Step 3 Click on the left of y, ym, and ymd to display the Converter List dialog box, and thenchoose Create Converter > Expression conversion. See Figure 6-41.

The expressions for the y, ym, and ymd fields are as follows:

DateUtils.format(DateUtils.parseDate(row[2],"yyyy-MM-dd HH:mm:ss.SSS"),"yyyy")

DateUtils.format(DateUtils.parseDate(row[2],"yyyy-MM-ddHH:mm:ss.SSS"),"yyyyMM")

DateUtils.format(DateUtils.parseDate(row[2],"yyyy-MM-ddHH:mm:ss.SSS"),"yyyyMMdd")



Figure 6-41 Configuring the expression

NOTE

The expressions in CDM support field conversion of common character strings, dates, and values. Fordetails, see Field Conversion During Migration.











----End

6.9 Migrating Data from the MySQL Database to DDMDDM removes the capacity and performance bottlenecks of databases and solves distributedexpansion issues. DDM supports sharding, read/write isolation, and elastic scaling, enablinghigh concurrent access to mass data and improving database read/write performance.



https://www.huaweicloud.com/en-us/product/ddm.html

This section describes how to use CDM to migrate a table from the on-premises MySQLdatabase to DDM and store data in a distributed manner.

Scenario1. Suppose that there is a trip table in the sqoop MySQL database. The table stores cycling

records such as the start time, end time, start sites, end sites, and rider IDs. For detailsabout the fields in the trip table, see Table 6-5.

Table 6-5 Fields in the trip table

Field Type

tripid int

duration int

startdate timestamp

startstation varchar(64)

startterminal int

enddata timestamp

endstation varchar(64)

endterminal int

bike int

subscriberType varchar(32)

zipcode varchar(10)

2. You have created a DDM instance and created a schema. For details about the

operations, see Getting Started in the Distributed Database Middleware User Guide.For example, Figure 6-42 shows the DDM instance purchased here. The name of theschema created on DDM is db_cdm.

NOTE

DDM supports multiple instances of different specifications. The parallel computing capabilityimproves as the number of cores increases. The larger the memory is, the more complicated thedata can be queried and processed in batches.

You can select proper specifications based on the service plan to reduce the cost of using DDM.For details, see Selecting Proper DDM Specifications.



https://support.huaweicloud.com/en-us/usermanual-ddm/en-us_topic_0061804095.html

https://support.huaweicloud.com/en-us/bestpractice-ddm/en-us_topic_0062151536.html

Figure 6-42 Basic information about the DDM instance

The following describes how to use CDM to migrate the trip table from the sqoop database tothe db_cdm schema of DDM. The procedure is as follows:

1. Creating a Sharded Table in the DDM Schema2. Creating a CDM Cluster and Binding an EIP to the Cluster3. Creating a MySQL Link4. Creating a DDM Link5. Creating a Migration Job

Prerequisitesl You have sufficient EIP quota. The on-premises MySQL database can be accessed

through the EIP address.l You have obtained the IP address, port number, username, and password for connecting

to the sqoop database. In addition, the user must have the read and write permissions onthe database.

l You have obtained the username and password of the db_cdm schema. In addition, theuser must have the read and write permissions on the schema.

l You have associated the DDM instance with an RDS instance in the same VPC. Fordetails about the operation, see the Distributed Database Middleware User Guide.

Creating a Sharded Table in the DDM Schema

Create a sharded table named trip_ddm in the db_cdm schema of DDM. The field name andfield type are the same as those in the trip table of the on-premises MySQL database.Configure Table Type, Sharding Rule, Sharding Key, and SQL Statement Used for TableCreation. The following offers the SQL statement to create a table. For more information, seeCreating a Logical Table.create table trip_ddm( tripid int, duration int, startdate timestamp, startstation varchar(64), startterminal int, enddata timestamp,



https://support.huaweicloud.com/en-us/usermanual-ddm/en-us_topic_0061804096.html

endstation varchar(64), endterminal int, bike int, subscriberType varchar(32), zipcode varchar(10) )


Step 1 Log in to the CDM management console and click Buy CDM. On the Buy CDM Clusterpage that is displayed, configure the required parameters. Table 6-6 describes the parameters.



CurrentRegion

CN North-Beijng1

The region must be the same as the region where theDDM instance is located.

AZ AZ1

Cluster Name cdm131 Enter a custom CDM cluster name.

Version 1.3.0 CDM version. Retain the default value.

Instance Type cdm.medium Currently, the following flavors are available:l cdm.small: 2 vCPUs with 4 GB memory,

applicable to Proof of Concept (PoC) verificationand development tests

l cdm.medium: 4 vCPUs with 8 GB memory,applicable to migration of a single database tablewith fewer than 10 million pieces of data

l cdm.large: 8 vCPUs with 16 GB memory,applicable to migration of a single database tablewith 10 million pieces of data or more

l cdm.xlarge: 16 vCPUs with 32 GB memory,applicable to TB-level data migration requiring10GE high-speed bandwidth

Select cdm.medium, which is applicable to mostmigration scenarios.

VPC myvpc The CDM cluster and the DDM instance must be inthe same VPC, subnet, and security group. For detailsabout the DDM instance network information, seeFigure 6-42.

The CDM cluster accesses the DDM instance throughthe intranet.

Subnet subnet-168-1(192.168.1.0/24)

SecurityGroup

Sys-default

AutoShutdown

No Retain default values of the parameters.

ScheduledStartup

No

ScheduledShutdown

No




Step 2 After the CDM cluster is created, bind an EIP to the cluster on the Cluster Managementpage. The CDM cluster uses the EIP to access the on-premises MySQL database. See Figure6-43.


----End













192.168.0.1







admin



NOTE


----End

Creating a DDM Link

Step 1 On the Link Management tab page, click Create Link and select DDM in RelationalDatabase.

Step 2 Click Next and configure the DDM link parameters. See Figure 6-46.



Figure 6-46 Creating a DDM link

NOTE

l Database Server and Port: Enter one of the access addresses of the DDM instance. For detailsabout the access addresses of the DDM instance, see Figure 6-42.

l Database Name: Enter the name of the schema of the DDM instance, for example, db_cdm.

l Username and Password: Enter the username and password used for logging in to DDM. The usermust have the permission to read and write the db_cdm schema.


----End






NOTE

l Job Name: Enter a custom job name.

l Source Job Configuration:

– Source Link Name: Select the mysqllink link created in Creating a MySQL Link.

– Schema/Tablespace: Select the sqoop database where the trip table is located.

– Table Name: Select the trip table.

l Destination Job Configuration:

– Source Link Name: Select the ddmlink link created in Creating a DDM Link.

– Schema/Tablespace: Select the db_cdm schema of the DDM instance.

– Table Name: Select the trip_ddm logical table of the DDM instance, that is, the sharded tablecreated in Creating a Sharded Table in the DDM Schema.

– Clear Data Before Import: Retain the default value.

Step 2 Click Next. The Map Field page is displayed. See Figure 6-48.

The fields in the trip table in the sqoop database are the same as those in the trip_ddm tablein DDM. CDM automatically maps the fields with the same name. You only need to checkwhether the field mapping and time format are correct.




NOTE

l If the field mapping is incorrect, click the row where the field is located and drag the field to adjustthe mapping.

l If you need to convert the content of the source fields, perform the operations described in FieldConversion During Migration. In this example, the field conversion is not required.









Figure 6-49 Job execution result



Step 5 After the job is successfully executed, in the Operation column of the job, click HistoricalRecord to view the job's historical execution records, read/write statistics, and job log.

Figure 6-50 Querying migration job records

----End

6.10 Migrating the Entire MySQL Database to RDS

Scenario

This section describes how to migrate the entire on-premises MySQL database to RDS onHUAWEI CLOUD using the CDM's entire DB migration function.

Currently, CDM can migrate the entire on-premises MySQL database to RDS for MySQL,RDS for PostgreSQL, or RDS for SQL Server. The following describes how to migrate theentire database to RDS for MySQL. The procedure is as follows:

1. Creating a CDM Cluster and Binding an EIP to the Cluster2. Creating a MySQL Link3. Creating an RDS Link4. Creating an Entire Database Migration Job

Prerequisitesl You have sufficient EIP quota.l You have subscribed to an RDS database instance and the database engine of this

instance is MySQL.l The on-premises MySQL database can be accessed through the public network. If the

MySQL server is deployed on a local data center or a third-party cloud, ensure that an IPaddress that can be accessed from the public network has been configured for theMySQL database, or a VPN channel or Direct Connect from the internal data center toHUAWEI CLOUD has been established. To enable public network access, see How Do IConnect On-premises Intranet or Third-Party Private Network to CDM.

l You have obtained the IP addresses, names, usernames, and passwords of the on-premises MySQL database and RDS for MySQL.



Generally, select cdm.medium to meet the requirements of most migration scenarios.




l The CDM cluster and the RDS for MySQL instance must be in the same VPC. Inaddition, it is recommended that the CDM cluster be in the same subnet and securitygroup as the RDS for MySQL instance.

l If the same subnet and security group cannot be used for security purposes, ensure that asecurity group rule has been configured to allow the CDM cluster to access the RDS forMySQL instance.

Step 2 After the CDM cluster is created, on the Cluster Management page, click Bind Elastic IP inthe Operation column to bind an EIP to the cluster. The CDM cluster uses the EIP to accessthe on-premises MySQL database.


----End













192.168.0.1







admin



NOTE


----End

Creating an RDS Link

Step 1 Select RDS (MySQL), click Next, and configure the RDS link parameters.l Name: Enter a custom link name, for example, rds_link.l Database Server and Port: Enter the address information about the RDS for MySQL

database.l Database Name: Enter the name of the RDS for MySQL database.l Username and Password: Enter the username and password used for logging in to the

database.

NOTE

l During RDS link creation, if Use Local API in Show Advanced Attributes is set to Yes, you canuse the LOAD DATA function provided by MySQL to speed up data import.

l The LOAD DATA function is disabled by default on RDS for MySQL, so you need to modify theparameter group of the MySQL instance and set local_infile to ON to enable this function.

l If the local_infile parameter group cannot be edited, it is the default parameter group. You need tocreate a parameter group and modify its value, and apply it to the MySQL instance of RDS.


----End

Creating an Entire Database Migration Job

Step 1 After the two links are created, choose Entire DB Migration > Create Job to create amigration job. See Figure 6-54.




l Job Name: Enter a name for the entire database migration job.

l Source Job Configuration– Source Link Name: Select the mysql_link link created in Creating a MySQL

Link.– Schema/Tablespace: Select the on-premises MySQL database from which data is

to be exported.

l Destination Job Configuration– Destination Link Name: Select the rds_link link created in Creating an RDS

Link.– Schema/Tablespace: Select the name of the RDS database to which data is to be

imported.– Auto Table Creation: Select Auto creation, which indicates that CDM

automatically creates tables in the RDS database when tables of the on-premisesMySQL database do not exist in the RDS database.

– Clear Data Before Import: Select Yes, which indicates that when a table with thesame name as the table in the on-premises MySQL database exists in the RDSdatabase, CDM clears data in the table on RDS.

– Retain the default values of the optional parameters in Show Advanced Attributes.

Step 2 Click Next and select the tables to be migrated. After selecting desired tables, click or

to move them to the right pane.

Step 3 Click Save and Run and CDM immediately starts the entire database migration job.

When the job starts running, a sub-job will be generated for each table. You can click the jobname to view the sub-job list.



Step 4 In the Operation column of the job, click Historical Record to view the job's historicalexecution records and read/write statistics.

There is no log for the entire database migration job. However, the sub-jobs have logs. On theHistorical Record page of the sub-jobs, click Log to view the job log.

----End

6.11 Migrating the Entire Elasticsearch Database to CloudSearch Service

Scenario

Cloud Search Service provides users with structured and unstructured data search, statistics,and report capabilities. This section describes how to use CDM to migrate the entireElasticsearch database to Cloud Search Service. The procedure is as follows:

1. Creating a CDM Cluster and Binding an EIP to the Cluster2. Creating a Cloud Search Service Link3. Creating an Elasticsearch Link4. Creating an Entire Database Migration Job

Prerequisitesl You have sufficient EIP quota.

l You have subscribed to Cloud Search Service and obtained the IP address and portnumber of the Cloud Search Service cluster.

l You have obtained the IP address, port number, username, and password of the on-premises Elasticsearch database server.If the Elasticsearch server is deployed on an on-premises data center or a third-partycloud, ensure that an IP address that can be accessed from the public network has beenconfigured for the Elasticsearch database, or the VPN or Direct Connect between the on-premises data center and HUAWEI CLOUD has been established. To enable publicnetwork access, see How Do I Connect On-premises Intranet or Third-Party PrivateNetwork to CDM.


Step 1 Log in to the CDM management console and create a CDM cluster. For details about how tocreate a cluster, see Creating a Cluster. The key configurations are as follows:


l The CDM and Cloud Search Service clusters must be in the same VPC. In addition, it isrecommended that the CDM cluster be in the same subnet and security group as theCloud Search Service cluster.




https://www.huaweicloud.com/en-us/product/es.html


Step 2 After the CDM cluster is created, on the Cluster Management page, click Bind Elastic IP inthe Operation column to bind an EIP to the cluster. The CDM cluster uses the EIP to accessthe on-premises Elasticsearch.

----End











----End

Creating an Elasticsearch Link

Step 1 On the Link Management tab page, click Create Link. On the page that is displayed, selectElasticsearch, click Next, and configure the Elasticsearch link parameters. The Elasticsearchlink parameters are the same as those of the Cloud Search Service link.l Name: Enter a custom link name, for example, es_link.l Elasticsearch Server and Port: Enter the IP address and port number of the on-premises

Elasticsearch database.l Username and Password: If the Elasticsearch database has user restrictions, select the

user who has the read and write permissions on the Elasticsearch database. If there is norestriction, you do not need to set the parameters.


----End

Creating an Entire Database Migration Job

Step 1 Choose Entire DB Migration > Create Job to create an entire database migration job.




l Job Name: Enter a unique name.l Source Job Configuration

– Source Link Name: Select the es_link link created in Creating an ElasticsearchLink.

– Index: Click the icon next to the text box to select an index in the on-premisesElasticsearch database or manually enter an index name. The name can contain onlylowercase letters.

l Destination Job Configuration– Destination Link Name: Select the csslink link created in Creating a Cloud

Search Service Link.– Index: Enter the index of the data to be written. You can select an existing index in

Cloud Search Service or manually enter an index name that does not exist. Thename can contain only lowercase letters. CDM automatically creates the index inCloud Search Service.

– Clear Data Before Import: If the selected index already exists in Cloud SearchService, you can choose whether to clear the data in the index before importingdata. If you set this parameter to No, the data is appended to the index.


A sub-job will be generated for each type in the on-premises Elasticsearch index forconcurrent execution. You can click the job name to view the sub-job progress.

Step 3 After the job is successfully executed, in the Operation column of the job, click HistoricalRecord to view the job's historical execution records, read/write statistics, and job log (onlythe sub-jobs have job logs).



Figure 6-58 Historical Record

----End



7 Advanced Operations

This section describes how to configure CDM in advanced scenarios and how to use advancedCDM parameters. It is applicable to users who are familiar with basic CDM functions.

CDM supports the following advanced scenarios:l Incremental File Migrationl Incremental Migration of Relational Databasesl HBase/CloudTable Incremental Migrationl Incremental Synchronization Using the Macro Variables of Date and Timel Migration in Transaction Model Data Encryption During the Migration to OBSl MD5 Verification for Files in Migrationl Field Conversion During Migrationl Migration of a List of Filesl Using Regular Expressions to Separate Semi-structured Textl GDS Import Model File Formats

7.1 Incremental File MigrationCDM supports incremental migration of file systems. After full migration is complete, all newfiles or only specified directories or files can be exported.

l Exporting all new files– Application scenarios: Both the migration source and destination are file systems

(OBS/OSS/HDFS/FTP/SFTP/NAS).– Key configurations: Duplicate File Processing Method and Configuring

Scheduled Jobs– Prerequisites: None

l Exporting files in a specified directory– Application scenarios: The source end is a file system (OBS/OSS/HDFS/FTP/

SFTP/NAS). The destination end can be of any type. In incremental migration, only

Cloud Data MigrationUser Guide 7 Advanced Operations


the specified files are written to the destination end. The existing records are notupdated or deleted.

– Key configurations: File/Path Filter and Configuring Scheduled Jobs– Prerequisites: The source directory or file name contains the time field.

Duplicate File Processing MethodWhen creating a table/file migration job, if the source and destination ends are file systems,the Duplicate File Processing Method parameter is available in Destination LinkConfiguration. You can select Replace, Skip, or Stop job. When a file with the same nameand size exists on the source and destination ends, CDM determines that the file is a duplicatefile.

CDM supports binary file transfer (without parsing files), which delivers the optimaltransmission rate. If the path from which data is to be exported is a directory, CDM importsall files in the directory to the migration destination.

If files in the source directory are added irregularly, the key configurations for job creation areas follows:

1. Set Duplicate File Processing Method of the destination link to Skip. See Figure 7-1.

Figure 7-1 Skipping duplicated files

2. Configure scheduled job execution.

In this way, you can import the newly added files to the destination directory periodically toimplement incremental synchronization.

File/Path FilterWhen creating a table/file migration job, if the source end is a file system, the Filter Typeparameter is available in Source Link Configuration. You can select either Wildcard orRegex. During incremental file migration, Wildcard is selected. In this way, you canconfigure a wildcard to filter files or paths. CDM migrates only files or paths that meetspecified conditions.



If the source file name contains the date and time field, such as 2017-10-15 20:25:26,the /opt/data/file_20171015202526.data file is generated. The key configurations for jobcreation are as follows:

1. In source link parameters, set Filter Type to Wildcard. See Figure 7-2.

Figure 7-2 Filtering files

2. Enter *${dateformat(yyyyMMdd,-1,DAY)}* in File Filter. *${dateformat(yyyyMMdd,-1,DAY)}* is the macro variable format of date and timesupported by CDM. For details, see Incremental Synchronization Using the MacroVariables of Date and Time.

3. Select Schedule Execution and set Cycle to one day.

In this way, you can import the files generated in the previous day to the destination directoryevery day to implement incremental synchronization.

In incremental file migration, Path Filter is used in the same way as File Filter. The pathname must contain the time field. In this case, all files in the specified directory can besynchronized periodically.

7.2 Incremental Migration of Relational DatabasesCDM supports incremental migration of relational databases. After full migration is complete,data whose field value is greater than the specified field value or within a specified period oftime can be incrementally migrated. For example, only data whose date value is greater than2017-10-16 19:00:00 is exported each time when a job is started, or data generated on theprevious day is exported at 00:00:00 every day.



l Migrating incremental data whose field value is greater than the specified fieldvalue– Application scenarios: Both the migration source and destination are relational

databases.– Key configurations: Incremental Migration Using the Regain Symbol and

Schedule Execution– Prerequisites: The data table contains a numeric field or timestamp field that is

unique and automatically increases.l Migrating incremental data within a specified period of time

– Application scenarios: The source end is a relational database. For details, seeFrom a Relational Database. The destination end can be of any type.

– Key configurations: Where Clause and Schedule Execution– Prerequisites: The data table contains a date and time field or timestamp field.

In incremental migration, only the specified data is written to the data table. The existingrecords are not updated or deleted.

Incremental Migration Using the Regain SymbolWhen creating a table/file migration job, if both the migration source and destination arerelational databases, the Regain Symbol parameter is available in the advanced attributes ofSource Link Configuration.

After Regain Symbol is set to a specified field, CDM queries the table imported to thedestination database every time a scheduled task is started. If the table does not contain thespecified field, CDM performs full migration. If the table contains the specified field and thefield has a value, CDM performs incremental migration to migrate only the data whose valueis greater than the value of this field.

The specified field must have a unique value that automatically increases, for example,auto_increment int, timestamp, or date.

This parameter is used together with the scheduled jobs of CDM configured according toScheduling Job Execution, so that jobs are scheduled to implement incrementalsynchronization of relational databases.

For example, the date field in the data table records the date and time when each data recordis created. When a migration job is created, this field is specified as Regain Symbol. SeeFigure 7-3. If a scheduled job is configured to automatically run every three hours, fullmigration is performed when the job is run for the first time. Incremental synchronization isperformed when the job is run for the second time, and only data created in the last threehours is exported. Subsequently, the data is automatically synchronized every three hours.



Figure 7-3 Regain Symbol

Incremental Migration Using the Where Clause with the Time VariableWhen creating a table/file migration job, if the source end is a relational database, theWhere Clause parameter is available in the advanced attributes of Source LinkConfiguration.

Set Where Clause to an SQL statement, for example, age > 18 and age <= 60), CDMexports only the data that meets the SQL statement requirement. If Where Clause is notspecified, the entire table is exported.

Where Clause can be set to macro variables of date and time. When the data table containsthe date or timestamp field, Where Clause and Schedule Execution can be used together toextract data of a specified date.

For example, the database table contains column DS that indicates the time, the value type ofthe column is varchar(30), and the inserted time format is similar to 2017-xx-xx.



Figure 7-4 Table data

Set Where Clause to DS='${dateformat(yyyy-MM-dd,-1,DAY)}', as shown in Figure 7-5.If the scheduled job automatically executes at 00:00:00 every day, all data created on theprevious day can be exported at 00:00:00 every day.



Figure 7-5 Where Clause

Where Clause can be configured to various macro variables of date and time. You can usethe macro variables of date and time and scheduled jobs with specified cycle of minutes,hours, days, weeks, or months together to automatically export data at a specific time.

7.3 HBase/CloudTable Incremental MigrationYou can use CDM to export data in a specified period of time from HBase (including MRSHBase, FusionInsight HBase, and Apache HBase) and CloudTable. The CDM scheduled jobscan be used together to implement incremental migration of HBase and CloudTable.

When creating a table/file migration job and selecting Link to HBase or Link to CloudTableas the migration source, you can set the time range in advanced attributes. Figure 7-6 showsthe configuration items in advanced attributes.



Figure 7-6 Time range

l Start time (including the value) for extracting data. The format is yyyy-MM-ddHH:mm:ss. Only the data generated at the specified time and later is extracted.

l End time (excluding the value) for extracting data. The format is yyyy-MM-ddHH:mm:ss. Only the data generated before the time point is extracted.

The two parameters can be set to macro variables of date and time. Examples are asfollows:

l If Minimum Timestamp is set to ${dateformat(yyyy-MM-dd HH:mm:ss, -1, DAY)},only the data generated after the day before is exported.

l If Maximum Timestamp is set to ${dateformat(yyyy-MM-dd HH:mm:ss)}, only thedata generated before the specified time point is exported.

If both parameters are configured, only the data generated on the previous day is exported. Inaddition, if the job is configured to execute at 00:00:00 every day, the data generated everyday can be incrementally synchronized.

7.4 Incremental Synchronization Using the MacroVariables of Date and Time

During the creation of table/file migration jobs, CDM supports the macro variables of dateand time in the following parameters of the source and destination links:l Source directoryl Source table namel Write directoryl Destination table namel Where clause

You can use the ${} macro variable definition identifier to define the macros of the time type.currently, dateformat and timestamp are supported.



By using the macro variables of date and time and scheduled job, you can implementincremental synchronization of databases and files.

dateformat

dateformat supports two types of parameters:

l dateformat(format)format indicates the date and time format. For details about the format definition, see thedefinition in java.text.SimpleDateFormat.java.For example, if the current date is 2017-10-16 09:00:00, yyyy-MM-dd HH:mm:ssindicates 2017-10-16 09:00:00.

l dateformat(format, dateOffset, dateType)– format indicates the format of the returned date.– dateOffset indicates the date offset.– dateType indicates the type of the date offset.

Currently, dateType supports SECOND, MINUTE, HOUR, and DAY.For example, if the current date is 2017-10-16 09:00:00, dateformat(yyyy-MM-ddHH:mm:ss, -1, DAY) indicates the day before the current day, that is, 2017-10-1509:00:00.

timestamp

timestamp supports two types of parameters:

l timestamp()Indicates the returned timestamp of the current time, that is, the number of millisecondsthat have elapsed since 00:00:00 on January 1, 1970 (1970-01-01 00:00:00 GMT). Forexample, 1508078516286.

l timestamp(dateOffset, dateType)Indicates the timestamp returned after time offset. dateOffset and dateType indicate thedate offset and the offset type, respectively.For example, if the current date is 2017-10-16 09:00:00, timestamp(-10, MINUTE)indicates that the timestamp generated 10 minutes before the current time point isreturned, that is, 1508115000000.

Macro Variable Definition of Time and Date

Suppose that the current time is 2017-10-16 09:00:00, then Table 7-1 describes the macrovariable definitions of time and date.

Table 7-1 Macro variable definition of time and date

Macro Variable Description Display Effect

${dateformat(yyyy-MM-dd)} Returns the current date in yyyy-MM-dd format.

2017-10-16

${dateformat(yyyy/MM/dd)} Returns the current date inyyyy/MM/dd format.

2017/10/16



Macro Variable Description Display Effect

${dateformat(yyyy_MM_ddHH:mm:ss)}

Returns the current time inyyyy_MM_dd HH:mm:ssformat.

2017_10_1609:00:00

${dateformat(yyyy-MM-ddHH:mm:ss, -1, DAY)}

Returns the current time in yyyy-MM-dd HH:mm:ss format. Thedate is one day before the currentday.

2017-10-1509:00:00

${timestamp()} Returns the timestamp of thecurrent time, that is, the numberof milliseconds that have elapsedsince 00:00:00 on January 1,1970.

1508115600000

${timestamp(-10, MINUTE)} Returns the timestamp generated10 minutes before the currenttime point.

1508115000000

${timestamp(dateformat(yyyyMMdd))}

Returns the timestamp of00:00:00 of the current day.

1508083200000

${timestamp(dateformat(yyyyMMdd,-1,DAY))}

Returns the timestamp of00:00:00 of the previous day.

1507996800000

${timestamp(dateformat(yyyyMMddHH))}

Returns the timestamp of thecurrent hour.

1508115600000

Time and Date Macro Variables of Paths and Table NamesFigure 7-7 shows an example, where:l Table Name under Source Link Configuration is set to CDM_/${dateformat(yyyy-

MM-dd)}.l Write Directory under Destination Link Configuration is set to /opt/ttxx/$

{timestamp()}.

After the macro definition conversion, this job indicates that data in tableSQOOP.CDM_20171016 in the Oracle database is migrated to the /opt/ttxx/1508115701746directory of the SFTP server.



Figure 7-7 Setting Table Name and Write Directory to a time and date macro variable

Currently, a table name or path name can contain multiple macro variables. For example, /opt/ttxx/${dateformat(yyyy-MM-dd)}/${timestamp()} is converted to /opt/ttxx/2017-10-16/1508115701746.

Time and Date Macro Variables in the Where Clause

Figure 7-8 uses table SQOOP.CDM_20171016 as an example. The table contains columnDS, which indicates the time.

Figure 7-8 Table data

Suppose that the current date is 2017-10-16 and you want to export data generated the daybefore the current day (DS = 2017-10-15), then you can set the value of Where Clause toDS='${dateformat(yyyy-MM-dd,-1,DAY)}' when creating a job. In this way, you can exportall data that complies with the DS = 2017-10-15 condition.



Implementing Incremental Synchronization by Configuring the Macro Variablesof Date and Time and Scheduled Jobs

Two simple application scenarios are as follows:

l The database table contains column DS that indicates the time, the value type of thecolumn is varchar(30), and the inserted time format is similar to 2017-xx-xx.In a scheduled job, the cycle is one day, and the scheduled job is executed at 00:00:00every day. Set the value of Where Clause to DS='${dateformat(yyyy-MM-dd,-1,DAY)}', and then data generated in the previous day will be exported at 00:00:00every day.

l The database table contains column time that indicates the time, the type is Number,and the inserted time format is timestamp.In a scheduled job, the cycle is one day, and the scheduled job is executed at 00:00:00every day. Set the value of Where Clause to time between '${timestamp(-1,DAY)}and ${timestamp()}', and then data generated in the previous day will be exported at00:00:00 every day.

Configuration principles of other application scenarios are the same.

7.5 Migration in Transaction ModeWhen a CDM job fails to be executed, CDM rolls back the data to the state before the jobstarts and automatically deletes data from the destination table.

When creating a table/file migration job, CDM allows you to select whether to enable thetransaction mode by configuring Import to Staging Table. If you set this parameter to Yes,CDM automatically creates a temporary table and imports the data to the temporary table.After the data is imported successfully, CDM migrates the data to the destination table intransaction mode of the database. If the import fails, the destination table is rolled back to thestate before the job starts. See Figure 7-9.



Figure 7-9 Migration in transaction mode

NOTE

If you set Clear Data Before Import to Yes, CDM does not roll back the deleted data even intransaction mode.

7.6 Data Encryption During the Migration to OBSWhen migrating data to OBS using CDM, you can perform KMS encryption during table/filemigration and entire database migration. See Figure 7-10. The key must be created in DataEncryption Workshop (DEW). For details, see the Data Encryption Workshop User Guide.



Figure 7-10 Enabling KMS encryption

After KMS encryption is enabled, objects to be uploaded will be encrypted and stored on theserver. When you download the encrypted objects, the encrypted data will be decrypted on theserver and displayed in plaintext to users.

NOTE

l If KMS encryption is enabled, MD5 verification cannot be used.

l After KMS encryption is performed, the encryption status of the objects on OBS cannot be changed.

l A key in use cannot be deleted. Otherwise, the object encrypted with this key cannot be downloaded.

7.7 MD5 Verification for Files in MigrationMD5 verification can be performed when CDM reads files from the SFTP/CIFS server andwrites the files to OBS in binary format. CDM checks the end-to-end file consistency andwrites the verification result to the OBS bucket. The bucket can be a bucket that does notstore migration files. See Figure 7-11.



Figure 7-11 Enabling MD5 verification to verify file consistency

If Validate MD5 Value is set to Yes, CDM checks whether the MD5 value of the files to beread is the same as that of the xx.md5 file in the source directory when CDM reads files fromthe migration source. If the migration source does not have the xx.md5 file, the verificationwill not be performed. After a file is read and written to OBS, the HTTP Header provides theMD5 value for OBS for verification.

NOTE

If MD5 verification is used, KMS encryption cannot be used.



7.8 Field Conversion During MigrationYou can create a field converter on the Map Field tab page when creating a table/filemigration job. See Figure 7-12.

Figure 7-12 Creating a field converter

NOTE

Field mapping is not involved when the binary format is used to migrate files to files.

CDM can convert fields during migration. Currently, the following field converters aresupported:



l Anonymizationl Triml Reverse Stringl Replace Stringl Expression Conversion

AnonymizationThis converter is used to hide key information about the character string. For example, if youwant to convert 12345678910 to 123****8910, set parameters according to Figure 7-13.l Set Reserve Start Length to 3.l Set Reserve End Length to 4.l Set Replace Character to *.

Figure 7-13 Anonymization

TrimThis converter is used to automatically delete the spaces before and after a string. Noparameters need to be configured.

Reverse StringThis converter is used to automatically reverse a string. For example, reverse ABC into CBA.No parameters need to be configured.

Replace StringThis converter is used to replace a character string. You need to configure the object to bereplaced and the new value.



Expression ConversionThis converter uses the JSP expression language (EL) to convert the current field or a row ofdata. The JSP EL is used to create arithmetic and logical expressions. Within a JSP ELexpression, you can use integers, floating point numbers, strings, the built-in constants trueand false for boolean values, and null.

The expression supports the following environment variables:l value: indicates the current field value.l row: indicates the current row, which is an array type.

The expression supports the following tool classes:l StringUtils: string processing tool class. For details, see

org.apache.commons.lang.StringUtils of the Java SDK code.l DateUtils: date tool classl CommonUtils: common tool classl NumberUtils: string-to-value conversion classl HttpsUtils: network file read class

Application examples:

1. Set a string constant for the current field, for example, VIP.Expression: "VIP"

2. If the field is of the string type, convert all character strings into lowercase letters, forexample, convert aBC to abc.Expression: StringUtils.lowerCase(value)

3. Convert all character strings of the current field to uppercase letters.Expression: StringUtils.upperCase(value)

4. If the field value is a date string in yyyy-MM-dd format, extract the year from the fieldvalue, for example, extract 2017 from 2017-12-01.Expression: StringUtils.substringBefore(value,"-")

5. If the field value is of the numeric type, convert the value to a new value which is twotimes greater than the original value:Expression: value*2

6. Convert the field value true to Y and other field values to N.Expression: value == "true"? "Y": "N"

7. If the field value is of the string type and is left empty, convert it to Default. Otherwise,the field value will not be converted.Expression: empty value? "Default" : value

8. If the first and second fields are of the numeric type, convert the field to the sum of thefirst and second field values.Expression: row[0] + row[1]

9. If the field is of the date or timestamp type, return the current year after conversion. Thedata type is int.Expression: DateUtils.getYear(value)

10. If the field is a date and time string in yyyy-MM-dd format, convert it to the date type:Expression: DateUtils.format(value,"yyyy-MM-dd")



11. Convert date format 2018/01/05 15:15:05 to 2018-01-05 15:15:05:Expression: DateUtils.format(DateUtils.parseDate(value,"yyyy/MM/ddHH:mm:ss"),"yyyy-MM-dd HH:mm:ss")

12. Obtain a 36-bit universally unique identifier (UUID):Expression: CommonUtils.randomUUID()

13. If the field is of the string type, capitalize the first letter, for example, convert cat to Cat.Expression: StringUtils.capitalize(value)

14. If the field is of the string type, convert the first letter to a lowercase letter, for example,convert Cat to cat.Expression: StringUtils.uncapitalize(value)

15. If the field is of the string type, use a space to fill in the character string to the specifiedlength and center the character string. If the length of the character string is not shorterthan the specified length, do not convert the character string. For example, convert ab toab to meet the specified length 4.Expression: StringUtils.center(value, 4)

16. Delete a newline (including \n, \r, and \r\n) at the end of a character string. For example,convert abc\r\n\r\n to abc\r\n.Expression: StringUtils.chomp(value)

17. If the string contains the specified string, true is returned; otherwise, false is returned.For example, abc contains a so that true is returned.Expression: StringUtils.contains(value, "a")

18. If the string contains any character of the specified string, true is returned; otherwise,false is returned. For example, zzabyycdxx contains either z or a so that true is returned.Expression: StringUtils.containsAny("value", "za")

19. If the string does not contain any one of the specified characters, true is returned. If anyspecified character is contained, false is returned. For example, abz contains onecharacter of xyz so that false is returned.Expression: StringUtils.containsNone(value, "xyz")

20. If the string contains only the specified characters, true is returned. If any other characteris contained, false is returned. For example, abab contains only characters among abc sothat true is returned.Expression: StringUtils.containsOnly(value, "abc")

21. If the character string is empty or null, convert it to the specified character string.Otherwise, do not convert the character string. For example, convert the empty characterstring to null.Expression: StringUtils.defaultIfEmpty(value, null)

22. If the string ends with the specified suffix (case sensitive), true is returned; otherwise,false is returned. For example, if the suffix of abcdef is not null, false is returned.Expression: StringUtils.endsWith(value, null)

23. If the string is the same as the specified string (case sensitive), true is returned;otherwise, false is returned. For example, after strings abc and ABC are compared, falseis returned.Expression: StringUtils.equals(value, "ABC")

24. Obtain the first index of the specified character string in a character string. If no index isfound, -1 is returned. For example, the first index of ab in aabaabaa is 1.



Expression: StringUtils.indexOf(value, "ab")25. Obtain the last index of the specified character string in a character string. If no index is

found, -1 is returned. For example, the last index of k in aFkyk is 4.Expression: StringUtils.lastIndexOf(value, "k")

26. Obtain the first index of the specified character string from the position specified in thecharacter string. If no index is found, -1 is returned. For example, the first index of bobtained after the index 3 of aabaabaa is 5.Expression: StringUtils.indexOf(value, "b", 3)

27. Obtain the first index of any specified character in a character string. If no index isfound, -1 is returned. For example, the first index of z or a in zzabyycdxx. is 0.Expression: StringUtils.indexOfAny(value, "za")

28. If the string contains any Unicode character, true is returned; otherwise, false isreturned. For example, ab2c contains only non-Unicode characters so that false isreturned.Expression: StringUtils.isAlpha(value)

29. If the string contains only Unicode characters and digits, true is returned; otherwise,false is returned. For example, ab2c contains only Unicode characters and digits, so thattrue is returned.Expression: StringUtils.isAlphanumeric(value)

30. If the string contains only Unicode characters, digits, and spaces, true is returned;otherwise, false is returned. For example, ab2c contains only Unicode characters anddigits so that true is returned.Expression: StringUtils.isAlphanumericSpace(value)

31. If the string contains only Unicode characters and spaces, true is returned; otherwise,false is returned. For example, ab2c contains Unicode characters and digits so that falseis returned.Expression: StringUtils.isAlphaSpace(value)

32. If the string contains only printable ASCII characters, true is returned; otherwise, falseis returned. For example, for !ab-c~, true is returned.Expression: StringUtils.isAsciiPrintable(value)

33. If the string is empty or null, true is returned; otherwise, false is returned.Expression: StringUtils.isEmpty(value)

34. If the string contains only Unicode digits, true is returned; otherwise, false is returned.Expression: StringUtils.isNumeric(value)

35. Obtain the leftmost characters of the specified length. For example, obtain the leftmosttwo characters ab from abc.Expression: StringUtils.left(value, 2)

36. Obtain the rightmost characters of the specified length. For example, obtain therightmost two characters bc from abc.Expression: StringUtils.right(value, 2)

37. Concatenate the specified character string to the left of the current character string andspecify the length of the concatenated character string. If the length of the currentcharacter string is not shorter than the specified length, the character string will not beconverted. For example, if yz is concatenated to the left of bat and the length must be 8after concatenation, the character string is yzyzybat after conversion.



Expression: StringUtils.leftPad(value, 8, "yz")38. Concatenate the specified character string to the right of the current character string and

specify the length of the concatenated character string. If the length of the currentcharacter string is not shorter than the specified length, the character string will not beconverted. For example, if yz is concatenated to the right of bat and the length must be 8after concatenation, the character string is batyzyzy after conversion.Expression: StringUtils.rightPad(value, 8, "yz")

39. If the field is of the string type, obtain the length of the current character string. If thecharacter string is null, 0 is returned.Expression: StringUtils.length(value)

40. If the field is of the string type, delete all the specified character strings from it. Forexample, delete ue from queued to obtain qd.Expression: StringUtils.remove(value, "ue")

41. If the field is of the string type, remove the substring at the end of the field. If thespecified substring is not at the end of the field, no conversion is performed. Forexample, remove .com at the end of www.domain.com.Expression: StringUtils.removeEnd(value, ".com")

42. If the field is of the string type, delete the substring at the beginning of the field. If thespecified substring is not at the beginning of the field, no conversion is performed. Forexample, delete www. at the beginning of www.domain.com.Expression: StringUtils.removeStart(value, "www.")

43. If the field is of the string type, replace all the specified character strings in the field. Forexample, replace a in aba with z to obtain zbz.Expression: StringUtils.replace(value, "a", "z")

44. If the field is of the string type, replace multiple characters in the character string at atime. For example, replace h in hello with j and o with y to obtain jelly.Expression: StringUtils.replaceChars(value, "ho", "jy")

45. If the field is of the string type, use the specified delimiter to split the text into arrays.For example, use : to split ab:cd:ef into ["ab", "cd", "ef"].Expression: StringUtils.split(value, ":")

46. If the string starts with the specified prefix (case sensitive), true is returned; otherwise,false is returned. For example, abcdef starts with abc, so that true is returned.Expression: StringUtils.startsWith(value, "abc")

47. If the field is of the string type, delete all the specified characters from the field. Forexample, delete all x, y, and z from abcyx to obtain abc.Expression: StringUtils.strip(value, "xyz")

48. If the field is of the string type, delete all the specified characters at the end of the field,for example, delete all spaces at the end of the field.Expression: StringUtils.stripEnd(value, null)

49. If the field is of the string type, delete all the specified characters at the beginning of thefield, for example, delete all spaces at the beginning of the field.Expression: StringUtils.stripStart(value, null)

50. If the field is of the string type, obtain the substring after the specified position(excluding the character at the specified position) of the character string. If the specifiedposition is a negative number, calculate the position in the descending order. Forexample, obtain the character string after the second character of abcde, that is, cde.



Expression: StringUtils.substring(value, 2)51. If the field is of the string type, obtain the substring within the specified range of the

character string. If the specified range is a negative number, calculate the range in thedescending order. For example, obtain the character string between the second and fifthcharacters of abcde, that is, cd.Expression: StringUtils.substring(value, 2, 5)

52. If the field is of the string type, obtain the substring after the first specified character. Forexample, obtain the substring after the first b in abcba, that is, cba.Expression: StringUtils.substringAfter(value, "b")

53. If the field is of the string type, obtain the substring after the last specified character. Forexample, obtain the substring after the last b in abcba, that is, a.Expression: StringUtils.substringAfterLast(value, "b")

54. If the field is of the string type, obtain the substring before the first specified character.For example, obtain the substring before the first b in abcba, that is, a.Expression: StringUtils.substringBefore(value, "b")

55. If the field is of the string type, obtain the substring before the last specified character.For example, obtain the substring before the last b in abcba, that is, abc.Expression: StringUtils.substringBeforeLast(value, "b")

56. If the field is of the string type, obtain the substring nested within the specified string. Ifno substring is found, null is returned. For example, obtain the substring between tag intagabctag, that is, abc.Expression: StringUtils.substringBetween(value, "tag")

57. If the field is of the string type, delete the control characters (char ≤ 32) at both ends ofthe character string, for example, delete the spaces at both ends of the character string.Expression: StringUtils.trim(value)

58. Convert the character string to a value of the byte type. If the conversion fails, 0 isreturned.Expression: NumberUtils.toByte(value)

59. Convert the character string to a value of the byte type. If the conversion fails, thespecified value, for example, 1, is returned.Expression: NumberUtils.toByte(value, 1)

60. Convert the character string to a value of the double type. If the conversion fails, 0.0d isreturned.Expression: NumberUtils.toDouble(value)

61. Convert the character string to a value of the double type. If the conversion fails, thespecified value, for example, 1.1d, is returned.Expression: NumberUtils.toDouble(value, 1.1d)

62. Convert the character string to a value of the float type. If the conversion fails, 0.0f isreturned.Expression: NumberUtils.toFloat(value)

63. Convert the character string to a value of the float type. If the conversion fails, thespecified value, for example, 1.1f, is returned.Expression: NumberUtils.toFloat(value, 1.1f)

64. Convert the character string to a value of the int type. If the conversion fails, 0 isreturned.



Expression: NumberUtils.toInt(value)65. Convert the character string to a value of the int type. If the conversion fails, the

specified value, for example, 1, is returned.Expression: NumberUtils.toInt(value, 1)

66. Convert the character string to a value of the long type. If the conversion fails, 0 isreturned.Expression: NumberUtils.toLong(value)

67. Convert the character string to a value of the long type. If the conversion fails, thespecified value, for example, 1L, is returned.Expression: NumberUtils.toLong(value, 1L)

68. Convert the character string to a value of the short type. If the conversion fails, 0 isreturned.Expression: NumberUtils.toShort(value)

69. Convert the character string to a value of the short type. If the conversion fails, thespecified value, for example, 1, is returned.Expression: NumberUtils.toShort(value, 1)

70. Convert the IP string to a value of the long type, for example, convert 10.78.124.0 to172915712.Expression: CommonUtils.ipToLong(value)

71. Read an IP address and physical address mapping file from the network, and downloadthe mapping file to the map collection. url indicates the address for storing the IPmapping file, for example, http://10.114.205.45:21203/sqoop/IpList.csv.Expression: HttpsUtils.downloadMap("url")

72. Cache the IP address and physical address mappings and specify a key for retrieval, forexample, ipList.Expression: CommonUtils.setCache("ipList",HttpsUtils.downloadMap("url"))

73. Obtain the cached IP address and physical address mappings.Expression: CommonUtils.getCache("ipList")

74. Check whether the IP address and physical address mappings are cached.Expression: CommonUtils.cacheExists("ipList")

75. Obtain the physical addresses corresponding to the IP address inCountry_Province_City_Carrier format. For example, the physical addresscorresponding to 1xx.78.124.0 is China_Guangdong_Shenzhen_China Telecom. If thecorresponding physical address cannot be obtained, the default value **_**_**_** isreturned. If necessary, you can use the StringUtil class expression to further split theaddresses.Expression:CommonUtils.getMapValue(CommonUtils.ipToLong(value),CommonUtils.cacheExists("ipLis")? CommonUtils.getCache("ipLis"):CommonUtils.setCache("ipLis",HttpsUtils.downloadMap("url")))

7.9 Migration of a List of FilesYou can migrate a list of files (a maximum of 50 files) from FTP, SFTP, NAS, OBS, OSS, orKODO at a time. The exported files can only be written to the same directory on themigration destination.



When creating a table/file migration job, if the migration source is FTP, SFTP, NAS, OBS,OSS, or Qiniu Cloud Object Storage, Source Directory/File can contain a maximum of 50file names, which are separated by vertical bars (|). See Figure 7-14.

Figure 7-14 Migrating a list of files

NOTE

1. CDM supports incremental file migration (by skipping repeated files), but does not supportresumable transfer.

For example, if three files are to be migrated and the second file fails to be migrated due to thenetwork fault. When the migration task is started again, the first file is skipped. The second file,however, cannot be migrated from the point where the fault occurs, but can only be migrated again.

2. During file migration, a single task supports a maximum of 100,000 files. If there are too many filesin the directory to be migrated, you are advised to split the files into different directories and createmultiple tasks.

7.10 Using Regular Expressions to Separate Semi-structured Text

During table/file migration, CDM uses delimiters to separate fields in CSV files. However,delimiters cannot be used in complex semi-structured data because the field values alsocontain delimiters. In this case, the regular expression can be used to separate the fields.

Regular expression parameters are configured in source job parameters. Currently, CDMsupports OBS, Alibaba Cloud OSS, KODO, FTP, SFTP, and NAS. File Format must beCSV. See Figure 7-15.



Figure 7-15 Setting regular expression parameters

When migrating CSV files, CDM uses the regular expression to separate fields. For detailsabout the syntax of the regular expression, refer to the related documents. This sectiondescribes the regular expressions of the following log files:l Log4J Logl Log4J Audit Logl Tomcat Logl Django Logl Apache Server Log

Log4J Logl Log sample:

2018-01-11 08:50:59,001 INFO [org.apache.sqoop.core.SqoopConfiguration.configureClassLoader(SqoopConfiguration.java:251)] Adding jars to current classloader from property: org.apache.sqoop.classpath.extra



l Regular expression:^(\d.*\d) (\w*) \[(.*)\] (\w.*).*

l Parsing result:

Log4J Audit Logl Log sample:

2018-01-11 08:51:06,156 INFO [org.apache.sqoop.audit.FileAuditLogger.logAuditEvent(FileAuditLogger.java:61)] user=sqoop.anonymous.user ip=189.xxx.xxx.75 op=show obj=version objId=

l Regular expression:^(\d.*\d) (\w*) \[(.*)\] user=(\w.*) ip=(\w.*) op=(\w.*) obj=(\w.*) objId=(.*).*

l Parsing result:



Tomcat Logl Log sample:

11-Jan-2018 09:00:06.907 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Name: Linux

l Regular expression:^(\d.*\d) (\w*) \[(.*)\] ([\w\.]*) (\w.*).*

l Parsing result:



Django Logl Log sample:

[08/Jan/2018 20:59:07 ] settings INFO Welcome to Hue 3.9.0

l Regular expression:^\[(.*)\] (\w*) (\w*) (.*).*

l Parsing result:

Apache Server Logl Log sample:

[Mon Jan 08 20:43:51.854334 2018] [mpm_event:notice] [pid 36465:tid 140557517657856] AH00489: Apache/2.4.12 (Unix) OpenSSL/1.0.1t configured -- resuming normal operations

l Regular expression:^\[(.*)\] \[(.*)\] \[(.*)\] (.*).*

l Parsing result:



7.11 GDS Import ModeWhen Creating a Link, you can set Import Mode to Copy or GDS for the DWS link. SeeFigure 7-16.



Figure 7-16 Import Mode

Gauss Data Service (GDS) is a data service component provided by DWS. It implementshigh-speed data import by using the foreign table mechanism. The directions of networkcommunication in Copy and GDS modes are different.l In Copy mode, CDM pushes data to DWS.l When the GDS mode is used, CDM creates a foreign table temporarily. Multiple

DataNodes of DWS concurrently pull data from CDM. The data does not pass throughthe management node of DWS. Therefore, the migration speed is faster and theperformance is better.

The GDS component is built in CDM, so that you do not need to install the GDS toolkit. Thekey configurations for importing data to DWS in GDS mode are as follows (CDM currentlydoes not support data export in GDS mode):

1. Configure DWS to allow users of the DWS link to create and delete foreign tables.



2. Configure the security group where the CDM cluster resides to allow the DWSDataNodes to access port 25000 of the internal IP address of the CDM cluster.

3. When creating a DWS link, set Import Mode to GDS.

4. Create a table/file migration job and set Destination Link Name to the DWS link withthe GDS mode enabled.

7.12 File FormatsWhen creating a CDM job, you need to specify File Format in the job parameters of themigration source and destination in some scenarios. This section describes the applicationscenarios, subparameters, common parameters, and usage examples of the supported fileformats.

l CSV

l JSON

l Binary

l Common parameters

l Solutions to File Format Problems

CSV

To read or write a CSV file, set File Format to CSV. The CSV format can used in thefollowing scenarios:

l Import files to a database or NoSQL.

l Export data from a database or NoSQL to files.

After selecting the CSV format, you can also configure the following optional sub-parameters:

1. Line Separator

2. Field Delimiter

3. Encoding Type

4. Use Quote Character

5. Use RE to Separate Fields

6. Use First Row as Header

7. File Size

1. Line Separator

Character used to separate lines in a CSV file. The value can be a single character,multiple characters, or special characters. Special characters can be entered using theURL encoded characters. The following table lists the URL encoded characters ofcommonly used special characters.



Table 7-2 URL encoded characters of special characters

Special Character URL Encoded Character

Space %20

Tab %09

% %25

Enter %0d

Newline character %0a

Start of heading\u0001 (SOH) %01

2. Field Delimiter

Character used to separate columns in a CSV file. The value can be a single character,multiple characters, or special characters. For details, see Table 7-2.

3. Encoding TypeEncoding type of a CSV file. The default value is UTF-8. Some Chinese characters areencoded by GBK.If this parameter is specified at the migration source, the specified encoding type is usedto parse the file. If this parameter is specified at the migration destination, the specifiedencoding type is used to write data to the file.

4. Use Quote Character– Exporting data from a database or NoSQL to CSV files (configuring Use Quote

Character at the migration destination): If a field delimiter appears in the characterstring of a column of data at the migration source, set Use Quote Character to Yesat the migration destination to quote the character string as a whole and write it intothe CSV file. Currently, CDM uses double quotation marks (") as the quotecharacter only. As shown in the following figure, the value of the name field in thedatabase contains a comma (,).

If you do not use the quote character, the exported CSV file is displayed as follows:3.hello,world,abcIf you use the quote character, the exported CSV file is displayed as follows:3,"hello,world",abcIf the data in the database contains double quotation marks (") and you set UseQuote Character to Yes, the quote character in the exported CSV file is displayedas three double quotation marks ("""). For example, if the value of a field isa"hello,world"c, the exported data is as follows:"""a"hello,world"c"""

– Exporting CSV files to a database or NoSQL (configuring Use Quote Character atthe migration source): If you want to import the CSV files with quoted values to a



database correctly, set Use Quote Character to Yes at the migration source to writethe quoted values as a whole.

5. Use RE to Separate FieldsThis function is used to parse complex semi-structured text, such as log files. For details,see Using Regular Expressions to Separate Semi-structured Text.

6. Use First Row as HeaderThis parameter is used when CSV files are exported to other locations. If this parameteris specified at the migration source, CDM uses the first row as the header whenextracting data. When the CSV files are transferred, the headers are skipped. The numberof rows extracted from the migration source is more than the number of rows written tothe migration destination. The log files will output the information that the header isskipped during the migration.

7. File SizeThis parameter is used when data is exported from the database to a CSV file. If a tablecontains a large amount of data, a large CSV file is generated after migration, which isinconvenient to download or view. In this case, you can specify this parameter at themigration destination so that multiple CSV files with the specified size can be generated.The value of this parameter is an integer. The unit is MB.

JSON

The following describes information about the JSON format:l JSON Types Supported by CDMl JSON Reference Nodel Copying Data from a JSON File

1. JSON types supported by CDM: JSON object and JSON array– JSON object: A JSON file contains a single object or multiple objects separated/

merged by rows.

i. The following is a single JSON object:{ "took" : 190, "timed_out" : false, "total" : 1000001, "max_score" : 1.0 }

ii. The following are JSON objects separated by rows:{"took" : 188, "timed_out" : false, "total" : 1000003, "max_score" : 1.0 }{"took" : 189, "timed_out" : false, "total" : 1000004, "max_score" : 1.0 }

iii. The following are merged JSON objects:{ "took": 190, "timed_out": false, "total": 1000001, "max_score": 1.0 } { "took": 191, "timed_out": false, "total": 1000002, "max_score": 1.0 }



– JSON array: A JSON file is a JSON array consisting of multiple JSON objects.[{ "took" : 190, "timed_out" : false, "total" : 1000001, "max_score" : 1.0 }, { "took" : 191, "timed_out" : false, "total" : 1000001, "max_score" : 1.0 }]

2. JSON Reference NodeRoot node that records data. The data corresponding to the node is a JSON array. CDMextracts data from the array in the same mode. Multi-layer nested JSON nodes areseparated buy periods (.).

3. Copying Data from a JSON File

a. Example 1: Extract data from multiple objects that are separated or merged. AJSON file contains multiple JSON objects. The following gives an example: { "took": 190, "timed_out": false, "total": 1000001, "max_score": 1.0 } { "took": 191, "timed_out": false, "total": 1000002, "max_score": 1.0 } { "took": 192, "timed_out": false, "total": 1000003, "max_score": 1.0 }To extract data from the JSON objects and write them to the database in thefollowing formats, perform the following operations:

took timedOut total maxScore

190 false 1000001 1.0

191 false 1000002 1.0

192 false 1000003 1.0

Set File Format to JSON and JSON Type to JSON object, and then map fields.b. Example 2: Extract data from the reference node. A JSON file contains a single

JSON object, but the valid data is on a data node. The following gives an example:{ "took": 190, "timed_out": false, "hits": { "total": 1000001, "max_score": 1.0, "hits":



[{ "_id": "650612", "_source": { "name": "tom", "books": ["chinese","english","math"] } }, { "_id": "650616", "_source": { "name": "tom", "books": ["chinese","english","math"] } }, { "_id": "650618", "_source": { "name": "tom", "books": ["chinese","english","math"] } }] }}

To write the data to the database in the following formats, perform the followingoperations:

ID SourceName SourceBooks

650612 tom ["chinese","english","math"]



Set File Format to JSON, JSON Type to JSON object, and JSON ReferenceNode to hits.hits, and then map fields.

c. Example 3: Extract data from the JSON array. A JSON file is a JSON arrayconsisting of multiple JSON objects. The following gives an example:[{ "took" : 190, "timed_out" : false, "total" : 1000001, "max_score" : 1.0 }, { "took" : 191, "timed_out" : false, "total" : 1000002, "max_score" : 1.0 }]

To write the data to the database in the following formats, perform the followingoperations:

took timedOut total maxScore

190 false 1000001 1.0

191 false 1000002 1.0



Set File Format to JSON and JSON Type to JSON array, and then map fields.

d. Example 4: Configure a converter when parsing the JSON file. On the premise ofexample 2, to add the hits.max_score field to all records, that is, to write the datato the database in the following formats, perform the following operations:

ID SourceName SourceBooks MaxScore

650612 tom ["chinese","english","math"] 1.0



Set File Format to JSON, JSON Type to JSON object, and JSON ReferenceNode to hits.hits, and then create a converter.

i. Click Add Fields to add a field.

ii. Click the icon highlighted in the following figure to create a converter for thenew field.



iii. Set Converter to Expression conversion, enter "1.0" in the Expression textbox, and click Save.

Binary

If you want to copy files between file systems, you can select the binary format. The binaryformat delivers the optimal rate and performance in file transfer, and does not require fieldmapping.

l Directory structure for file transferCDM can transfer a single file or all files in a directory at a time. After the files aretransferred to the migration destination, the directory structure remains unchanged.

l Migrating incremental filesWhen you use CDM to transfer files in binary format, configure Duplicate FileProcessing Method at the migration destination for incremental file migration. Fordetails, see Incremental File Migration.During incremental file migration, set Duplicate File Processing Method to Skip. Ifnew files exist at the migration source or a failure occurs during the migration, run thejob again, so that the migrated files will not be migrated repeatedly.

l Write to Temporary FileWhen migrating files in binary format, you can specify whether to write the files to atemporary file at the migration destination. If this parameter is specified, the file iswritten to a temporary file during file replication. After the file is successfully migrated,run the rename or move command to restore the file at the migration destination.

l Generate MD5 Hash ValueAn MD5 hash value is generated for each transferred file, and the value is recorded in anew .md5 file. You can specify the directory where the MD5 value is generated.

Common parametersl Source File Processing Method

After a file is copied successfully, CDM can perform operations on the source file. Theoptions are Rename and Delete.

l Start Job by Marker File



In automation scenarios, a scheduled task is configured on CDM to periodically readfiles from the migration source. However, files are being generated at the migrationsource. As a result, CDM reads data repeatedly or fails to read data from the migrationsource. You can specify the marker file for starting a job as ok.txt in the job parametersof the migration source. After the file is successfully generated at the migration source,the ok.txt file is generated in the file directory. In this way, CDM can read the completefile.In addition, you can set the suspension period. Within the suspension period, CDMperiodically queries whether the marker file exists. If the file does not exist after thesuspension period expires, the job fails.The marker file will not be migrated.

l Job Success Marker FileAfter data is successfully migrated to a file system, an empty file is generated in thedestination directory. You can specify the file name. Generally, this parameter is usedtogether with Start Job by Marker File.Note that the file cannot be confused with the file to be transferred. For example, if thefile to be transferred is finish.txt and the job success marker file is set to finish.txt, thetwo files will overwrite each other.

l FilterWhen using CDM to migrate files, you can specify a filter to filter files. You can chooseto filter the files by wildcard or regular expression. If you select regular expression, use aJava regular expression. CDM migrates only files that meet the filter conditions.For example, the /table/ directory stores a large number of data table directories, whichare divided by day. DRIVING_BEHAVIOR_20180101 toDRIVING_BEHAVIOR_20180630 store all data of DRIVING_BEHAVIOR fromJanuary to June. To migrate only the table data of DRIVING_BEHAVIOR in March, setSource Directory/File to /table, Filter Type to Wildcard, and Path Filter toDRIVING_BEHAVIOR_201803*.

Solutions to File Format Problems1. When data in a database is exported to a CSV file, if the data contains commas (,), the

data in the exported CSV file is disordered.The following solutions are available:

a. Specify a field delimiter.Use a character that does not exist in the database or a rare non-printable characteras the field delimiter. For example, set Field Delimiter at the migration destinationto %01. In this way, the exported field delimiter is \u0001. For details, see Table7-2.

b. Use the quote character.Set Use Quote Character to Yes at the migration destination. In this way, if thefield in the database contains the field delimiter, CDM quotes the field using thequote character and write the field as a whole to the CSV file.

2. The data in the database contains line separators.Scenario: When you use CDM to export a table in the MySQL database (a field valuecontains the line separator \n) to a CSV file, and then use CDM to import the exportedCSV file to MRS HBase, data in the exported CSV file is truncated.Solution: Specify a line separator.



When you use CDM to export MySQL table data to a CSV file, set Line Separator atthe migration destination to %01 (ensure that the value does not appear in the fieldvalue). In this way, the line separator in the exported CSV file is %01. Then use CDM toimport the CSV file to MRS HBase. Set Line Separator at the migration source to %01.This avoids data truncation.



8 FAQs

8.1 What Are the Advantages of CDM?Data migration is involved when you consolidate or back up data, or develop new applicationson the public cloud. Generally, if you want to migrate data, you may develop data migrationscripts to read data from the source and write data to the destination. However, this methodhas the following disadvantages:l Because the data source types are different, the program uses different access interfaces,

such as JDBC and native APIs, to read and write data. In this case, various libraries andSDKs are required when you write data migration scripts, resulting in high developmentand management costs.

l During data migration, the read and write process is completed in one job. Limited byavailable resources, the performance is poor and cannot meet the requirements ofscenarios where massive sets of data need to be migrated.

l As the cloud computing technology develops, user data may be stored in differentenvironments, such as public clouds, on-premises or hosted Internet data centers (IDCs),and hybrid scenarios. In heterogeneous environments, data migration is subject tovarious factors, for example, network connectivity, which causes inconvenience fordevelopment and maintenance.

CDM is developed based on a distributed computing framework and leverages the paralleldata processing technology. It has the following advantages:l Ease of use: You can migrate data by configuring data sources and migration jobs on the

GUI and CDM will manage and maintain the data sources and migration jobs for you. Inother words, you only need to focus on the data migration logic without worrying aboutthe environment, which greatly reduces development and maintenance costs.

l High efficiency: Based on the distributed computing framework, CDM jobs are split intoindependent sub-jobs and executed concurrently, which drastically improves datamigration efficiency. In addition, efficient data import APIs are used to import data fromHive, HBase, DWS, and MySQL database.

l Support for various data sources: Data sources such as databases, Hadoop services,NoSQL databases, data warehouses, and files are supported.

l Support for multiple network environments: CDM helps you easily cope with variousdata migration scenarios, including data migration to the cloud, data exchange on thecloud, and data migration to on-premises service systems, regardless of whether the data

Cloud Data MigrationUser Guide 8 FAQs


is stored on on-premises IDCs, third-party clouds (public cloud or private cloud),HUAWEI CLOUD services, or self-built databases or file systems using ECSs onHUAWEI CLOUD.

8.2 What Service Data Can Be Migrated by CDM?CDM can implement batch data migration between homogeneous and heterogeneous datasources, and supports other data sources such as on-premises file systems, file systems on thepublic cloud, relational databases, data warehouses, NoSQL databases, big data cloudservices, and object storage.

CDM supports table/file migration and entire DB migration:l Table/file migration: It is applicable to data migration to the cloud, data exchange on the

cloud, and data migration to on-premises service systems.l Entire DB migration: It is applicable to database migration to the cloud.

Table 8-1 describes the supported data sources.

Table 8-1 Supported data sources during table/file migration

Data SourceType



Data warehouse Data Warehouse Service (DWS) Supported Supported

Data Lake Insight (DLI) Not supported Supported

FusionInsight LibrA Supported Supported

Hadoop MRS HDFS Supported Supported

MRS HBase Supported Supported

MRS Hive Supported Supported

FusionInsight HDFS Supported Supported

Apache HDFS Supported Supported

Hadoop HBase Supported Supported

FusionInsight HBase Supported Supported

Object storage Object Storage Service (OBS) Supported Supported

Alibaba Cloud Object StorageService (OSS)

Supported Not supported

Qiniu Cloud Object Storage Supported Not supported

File system FTP Supported Supported

SFTP Supported Supported

HTTP Supported Not supported



Data SourceType



Network Attached Storage(NAS)

Supported Supported

Relational database RDS for MySQL Supported Supported

RDS for PostgreSQL Supported Supported

RDS for SQL Server Supported Supported

Distributed DatabaseMiddleware (DDM)

Supported Supported

MySQL Supported Supported

PostgreSQL Supported Not supported

Microsoft SQL Server Supported Not supported

Oracle Supported Not supported

IBM Db2 Supported Not supported

Derecho (GaussDB) Supported Not supported

NoSQL Distributed Cache Service(DCS)

Not supported Supported

Document Database Service(DDS)

Supported Supported

CloudTable Service(CloudTable)

Supported Supported

Redis Supported Not supported

MongoDB Supported Not supported

Search Cloud Search Service Supported Supported

Elasticsearch Supported Supported

Message system Data Ingestion Service (DIS) Supported(migrated toCloud SearchService only)

Not supported

Apache Kafka

NOTE

In the preceding table, the non-HUAWEI CLOUD data sources, such as MySQL, can be the MySQLbuilt in the local data center, created by users on Elastic Cloud Server (ECS), or on the third-party cloud.

Entire database migration is applicable to the scenario where an on-premises data center or adatabase created on the HUAWEI CLOUD ECS is synchronized to HUAWEI CLOUDdatabase services or big data services. It is suitable for offline database migration but not



online real-time migration. Figure 8-1 lists the data sources that support entire databasemigration using CDM.


8.3 What Security Protection Measures Are Used in CDM?CDM is a fully hosted service that provides the following capabilities to protect user datasecurity:

l Instance isolation: CDM users can use only their own instances. Instances are isolatedfrom each other and cannot access each other.

l System hardening: System hardening for security has been performed on the operatingsystem of the CDM instance, so attackers cannot access the operating system from theInternet.

l Key encryption: Keys of various data sources entered when users create links on CDMare stored in CDM databases using high-strength encryption algorithms.

l No intermediate storage: During data migration, CDM processes only data mapping andconversion without storing any user data or data fragments.

8.4 What is the Performance of Using CDM to MigrateData?

Theoretically, a single CDM instance allows 1 TB to 8 TB data to be migrated per day,depending on the network bandwidth and read and write performance of the data source.Different business departments, such as finance and online mall, can use different CDMinstances.

8.5 What Is the Most Economical Way to Migrate Datafrom the Public Network Using CDM?

1. If data is exported at a specified time every day, you can use the CDM shutdownfunction. The CDM cluster is started only when data is migrated. You are charged for astopped cluster for ¥0.05 per hour, that is, ¥1.2 per day, which is very favorable.



2. If the data on the public network is migrated, use the NAT gateway on HUAWEICLOUD to share the EIPs with other ECSs in the subnet. In this way, data on the on-premises data center or third-party cloud can be migrated in a more economical andconvenient manner.The following details the operations:

a. Suppose that you have created a CDM cluster (no dedicated EIP needs to be boundto the CDM cluster). Record the VPC and subnet where the CDM cluster is located.

b. Create a NAT gateway. Select the same VPC and subnet as the CDM cluster.c. After the NAT gateway is created, return to the NAT gateway console list, click the

created gateway name, and then click Add SNAT Rule.

Figure 8-2 Adding an SNAT rule

d. Select a subnet and an EIP. If no EIP is available, apply for one.Then, access the CDM management console to migrate data sources that areaccessed through the Internet to HUAWEI CLOUD. For example, migrate filesfrom the FTP server in the on-premises data center to OBS and migrate relationaldatabases from the third-party cloud to HUAWEI CLOUD RDS.

8.6 Does CDM Support Incremental Data Migration?CDM supports incremental data migration. With scheduled jobs and macro variables of dateand time, CDM provides incremental data migration in the following scenarios:l Both the data source and destination are file directories.l The data source is a file with the date and time field.l The data source is a relational database, and the database table name contains the date

and time field.l The data source is a relational database, and the database table contains a column that

stores the date field.

The following describes the key configurations of these scenarios.



https://www.huaweicloud.com/en-us/product/nat.html

Both Data Source and Destination Are File DirectoriesCDM supports binary transmission between files. When the source data is in a directory,CDM can import all files in the directory to the migration destination.

If files in the source directory are added irregularly, the key configurations for job creation areas follows:

1. Set Duplicate File Processing Method of the destination link to Skip. See Figure 8-3.

Figure 8-3 Skipping duplicated files

2. Configure scheduled job execution.

In this way, you can import the newly added files to the destination directory periodically toimplement incremental synchronization.

Data Source Is a File with the Date and Time FieldIf the source file name contains the date and time field, such as 2017-10-15 20:25:26,the /opt/data/file_20171015202526.data file is generated. The key configurations for jobcreation are as follows:

1. In source link parameters, set Filter Type to Wildcard. See Figure 8-4.



Figure 8-4 Filtering files

2. Enter *${dateformat(yyyyMMdd,-1,DAY)}* in File Filter. *${dateformat(yyyyMMdd,-1,DAY)}* is the macro variable format of date and timesupported by CDM. For details, see Incremental Synchronization Using the MacroVariables of Date and Time.


In this way, you can import the files generated in the previous day to the destination directoryevery day to implement incremental synchronization.

Data Source Is a Relational Database and Database Table Name Contains theDate and Time Field

The following uses the Oracle data table as the data source. A new data table is generated inthe data source every day and the table name contains the date and time field. For example, ifthe table is generated on October 15, 2017, the table name is table_20171015. The keyconfigurations for job creation are as follows:

1. In Source Job Configuration, set Table Name to table_${dateformat(yyyyMMdd)}.See Figure 8-5.



Figure 8-5 Definition of macro variables of date and time


In this way, the new database table can be imported to the destination every day.

Data Source Is a Relational Database and Database Table Contains a Columnthat Stores the Date Field

The following uses the MySQL database as the data source. The source table name is Data,and the DS field in Data indicates the date column. See Figure 8-6.

Figure 8-6 Data table

The key configurations for job creation are as follows:



1. In Source Job Configuration, set Where Clause to DS='${dateformat(yyyy-MM-dd,-1,DAY)}'. See Figure 8-7.

Figure 8-7 Configuring the macro variables of date and time using Where Clause

2. Select Schedule Execution and set Cycle to one day. The scheduled job is executed at00:00 every day.

In this way, the data generated in the previous day can be incrementally migrated to thedestination at 00:00 every day.

8.7 Can Fields Be Converted During Data Migration?Yes. CDM supports the following field converters:

l Anonymizationl Triml Reverse Stringl Replace Stringl Expression Conversion

You can create a field converter on the Map Field tab page when creating a table/filemigration job. See Figure 8-8.



Figure 8-8 Creating a field converter

AnonymizationThis converter is used to hide key information about the character string. For example, if youwant to convert 12345678910 to 123****8910, set parameters according to Figure 8-9.l Set Reserve Start Length to 3.l Set Reserve End Length to 4.l Set Replace Character to *.



Figure 8-9 Anonymization

Trim

This converter is used to automatically delete the spaces before and after a string. Noparameters need to be configured.

Reverse String

This converter is used to automatically reverse a string. For example, reverse ABC into CBA.No parameters need to be configured.

Replace String

This converter is used to replace a character string. You need to configure the object to bereplaced and the new value.

Expression Conversion

This converter uses the JSP expression language (EL) to convert the current field or a row ofdata. The JSP EL is used to create arithmetic and logical expressions. Within a JSP ELexpression, you can use integers, floating point numbers, strings, the built-in constants trueand false for boolean values, and null.

The expression supports the following environment variables:l value: indicates the current field value.l row: indicates the current row, which is an array type.

The expression supports the following tool classes:l StringUtils: string processing tool class. For details, see

org.apache.commons.lang.StringUtils of the Java SDK code.



l DateUtils: date tool classl CommonUtils: common tool classl NumberUtils: string-to-value conversion classl HttpsUtils: network file read class

Application examples:

1. Set a string constant for the current field, for example, VIP.Expression: "VIP"

2. If the field is of the string type, convert all character strings into lowercase letters, forexample, convert aBC to abc.Expression: StringUtils.lowerCase(value)

3. Convert all character strings of the current field to uppercase letters.Expression: StringUtils.upperCase(value)

4. If the field value is a date string in yyyy-MM-dd format, extract the year from the fieldvalue, for example, extract 2017 from 2017-12-01.Expression: StringUtils.substringBefore(value,"-")

5. If the field value is of the numeric type, convert the value to a new value which is twotimes greater than the original value:Expression: value*2

6. Convert the field value true to Y and other field values to N.Expression: value == "true"? "Y": "N"

7. If the field value is of the string type and is left empty, convert it to Default. Otherwise,the field value will not be converted.Expression: empty value? "Default" : value

8. If the first and second fields are of the numeric type, convert the field to the sum of thefirst and second field values.Expression: row[0] + row[1]

9. If the field is of the date or timestamp type, return the current year after conversion. Thedata type is int.Expression: DateUtils.getYear(value)

10. If the field is a date and time string in yyyy-MM-dd format, convert it to the date type:Expression: DateUtils.format(value,"yyyy-MM-dd")

11. Convert date format 2018/01/05 15:15:05 to 2018-01-05 15:15:05:Expression: DateUtils.format(DateUtils.parseDate(value,"yyyy/MM/ddHH:mm:ss"),"yyyy-MM-dd HH:mm:ss")

12. Obtain a 36-bit universally unique identifier (UUID):Expression: CommonUtils.randomUUID()

13. If the field is of the string type, capitalize the first letter, for example, convert cat to Cat.Expression: StringUtils.capitalize(value)

14. If the field is of the string type, convert the first letter to a lowercase letter, for example,convert Cat to cat.Expression: StringUtils.uncapitalize(value)

15. If the field is of the string type, use a space to fill in the character string to the specifiedlength and center the character string. If the length of the character string is not shorter



than the specified length, do not convert the character string. For example, convert ab toab to meet the specified length 4.

Expression: StringUtils.center(value, 4)16. Delete a newline (including \n, \r, and \r\n) at the end of a character string. For example,

convert abc\r\n\r\n to abc\r\n.

Expression: StringUtils.chomp(value)17. If the string contains the specified string, true is returned; otherwise, false is returned.

For example, abc contains a so that true is returned.

Expression: StringUtils.contains(value, "a")18. If the string contains any character of the specified string, true is returned; otherwise,

false is returned. For example, zzabyycdxx contains either z or a so that true is returned.

Expression: StringUtils.containsAny("value", "za")19. If the string does not contain any one of the specified characters, true is returned. If any

specified character is contained, false is returned. For example, abz contains onecharacter of xyz so that false is returned.

Expression: StringUtils.containsNone(value, "xyz")20. If the string contains only the specified characters, true is returned. If any other character

is contained, false is returned. For example, abab contains only characters among abc sothat true is returned.

Expression: StringUtils.containsOnly(value, "abc")21. If the character string is empty or null, convert it to the specified character string.

Otherwise, do not convert the character string. For example, convert the empty characterstring to null.

Expression: StringUtils.defaultIfEmpty(value, null)22. If the string ends with the specified suffix (case sensitive), true is returned; otherwise,

false is returned. For example, if the suffix of abcdef is not null, false is returned.

Expression: StringUtils.endsWith(value, null)23. If the string is the same as the specified string (case sensitive), true is returned;

otherwise, false is returned. For example, after strings abc and ABC are compared, falseis returned.

Expression: StringUtils.equals(value, "ABC")24. Obtain the first index of the specified character string in a character string. If no index is

found, -1 is returned. For example, the first index of ab in aabaabaa is 1.

Expression: StringUtils.indexOf(value, "ab")25. Obtain the last index of the specified character string in a character string. If no index is

found, -1 is returned. For example, the last index of k in aFkyk is 4.

Expression: StringUtils.lastIndexOf(value, "k")26. Obtain the first index of the specified character string from the position specified in the

character string. If no index is found, -1 is returned. For example, the first index of bobtained after the index 3 of aabaabaa is 5.

Expression: StringUtils.indexOf(value, "b", 3)27. Obtain the first index of any specified character in a character string. If no index is

found, -1 is returned. For example, the first index of z or a in zzabyycdxx. is 0.

Expression: StringUtils.indexOfAny(value, "za")



28. If the string contains any Unicode character, true is returned; otherwise, false isreturned. For example, ab2c contains only non-Unicode characters so that false isreturned.Expression: StringUtils.isAlpha(value)

29. If the string contains only Unicode characters and digits, true is returned; otherwise,false is returned. For example, ab2c contains only Unicode characters and digits, so thattrue is returned.Expression: StringUtils.isAlphanumeric(value)

30. If the string contains only Unicode characters, digits, and spaces, true is returned;otherwise, false is returned. For example, ab2c contains only Unicode characters anddigits so that true is returned.Expression: StringUtils.isAlphanumericSpace(value)

31. If the string contains only Unicode characters and spaces, true is returned; otherwise,false is returned. For example, ab2c contains Unicode characters and digits so that falseis returned.Expression: StringUtils.isAlphaSpace(value)

32. If the string contains only printable ASCII characters, true is returned; otherwise, falseis returned. For example, for !ab-c~, true is returned.Expression: StringUtils.isAsciiPrintable(value)

33. If the string is empty or null, true is returned; otherwise, false is returned.Expression: StringUtils.isEmpty(value)

34. If the string contains only Unicode digits, true is returned; otherwise, false is returned.Expression: StringUtils.isNumeric(value)

35. Obtain the leftmost characters of the specified length. For example, obtain the leftmosttwo characters ab from abc.Expression: StringUtils.left(value, 2)

36. Obtain the rightmost characters of the specified length. For example, obtain therightmost two characters bc from abc.Expression: StringUtils.right(value, 2)

37. Concatenate the specified character string to the left of the current character string andspecify the length of the concatenated character string. If the length of the currentcharacter string is not shorter than the specified length, the character string will not beconverted. For example, if yz is concatenated to the left of bat and the length must be 8after concatenation, the character string is yzyzybat after conversion.Expression: StringUtils.leftPad(value, 8, "yz")

38. Concatenate the specified character string to the right of the current character string andspecify the length of the concatenated character string. If the length of the currentcharacter string is not shorter than the specified length, the character string will not beconverted. For example, if yz is concatenated to the right of bat and the length must be 8after concatenation, the character string is batyzyzy after conversion.Expression: StringUtils.rightPad(value, 8, "yz")

39. If the field is of the string type, obtain the length of the current character string. If thecharacter string is null, 0 is returned.Expression: StringUtils.length(value)

40. If the field is of the string type, delete all the specified character strings from it. Forexample, delete ue from queued to obtain qd.



Expression: StringUtils.remove(value, "ue")41. If the field is of the string type, remove the substring at the end of the field. If the

specified substring is not at the end of the field, no conversion is performed. Forexample, remove .com at the end of www.domain.com.Expression: StringUtils.removeEnd(value, ".com")

42. If the field is of the string type, delete the substring at the beginning of the field. If thespecified substring is not at the beginning of the field, no conversion is performed. Forexample, delete www. at the beginning of www.domain.com.Expression: StringUtils.removeStart(value, "www.")

43. If the field is of the string type, replace all the specified character strings in the field. Forexample, replace a in aba with z to obtain zbz.Expression: StringUtils.replace(value, "a", "z")

44. If the field is of the string type, replace multiple characters in the character string at atime. For example, replace h in hello with j and o with y to obtain jelly.Expression: StringUtils.replaceChars(value, "ho", "jy")

45. If the field is of the string type, use the specified delimiter to split the text into arrays.For example, use : to split ab:cd:ef into ["ab", "cd", "ef"].Expression: StringUtils.split(value, ":")

46. If the string starts with the specified prefix (case sensitive), true is returned; otherwise,false is returned. For example, abcdef starts with abc, so that true is returned.Expression: StringUtils.startsWith(value, "abc")

47. If the field is of the string type, delete all the specified characters from the field. Forexample, delete all x, y, and z from abcyx to obtain abc.Expression: StringUtils.strip(value, "xyz")

48. If the field is of the string type, delete all the specified characters at the end of the field,for example, delete all spaces at the end of the field.Expression: StringUtils.stripEnd(value, null)

49. If the field is of the string type, delete all the specified characters at the beginning of thefield, for example, delete all spaces at the beginning of the field.Expression: StringUtils.stripStart(value, null)

50. If the field is of the string type, obtain the substring after the specified position(excluding the character at the specified position) of the character string. If the specifiedposition is a negative number, calculate the position in the descending order. Forexample, obtain the character string after the second character of abcde, that is, cde.Expression: StringUtils.substring(value, 2)

51. If the field is of the string type, obtain the substring within the specified range of thecharacter string. If the specified range is a negative number, calculate the range in thedescending order. For example, obtain the character string between the second and fifthcharacters of abcde, that is, cd.Expression: StringUtils.substring(value, 2, 5)

52. If the field is of the string type, obtain the substring after the first specified character. Forexample, obtain the substring after the first b in abcba, that is, cba.Expression: StringUtils.substringAfter(value, "b")

53. If the field is of the string type, obtain the substring after the last specified character. Forexample, obtain the substring after the last b in abcba, that is, a.



Expression: StringUtils.substringAfterLast(value, "b")54. If the field is of the string type, obtain the substring before the first specified character.

For example, obtain the substring before the first b in abcba, that is, a.Expression: StringUtils.substringBefore(value, "b")

55. If the field is of the string type, obtain the substring before the last specified character.For example, obtain the substring before the last b in abcba, that is, abc.Expression: StringUtils.substringBeforeLast(value, "b")

56. If the field is of the string type, obtain the substring nested within the specified string. Ifno substring is found, null is returned. For example, obtain the substring between tag intagabctag, that is, abc.Expression: StringUtils.substringBetween(value, "tag")

57. If the field is of the string type, delete the control characters (char ≤ 32) at both ends ofthe character string, for example, delete the spaces at both ends of the character string.Expression: StringUtils.trim(value)

58. Convert the character string to a value of the byte type. If the conversion fails, 0 isreturned.Expression: NumberUtils.toByte(value)

59. Convert the character string to a value of the byte type. If the conversion fails, thespecified value, for example, 1, is returned.Expression: NumberUtils.toByte(value, 1)

60. Convert the character string to a value of the double type. If the conversion fails, 0.0d isreturned.Expression: NumberUtils.toDouble(value)

61. Convert the character string to a value of the double type. If the conversion fails, thespecified value, for example, 1.1d, is returned.Expression: NumberUtils.toDouble(value, 1.1d)

62. Convert the character string to a value of the float type. If the conversion fails, 0.0f isreturned.Expression: NumberUtils.toFloat(value)

63. Convert the character string to a value of the float type. If the conversion fails, thespecified value, for example, 1.1f, is returned.Expression: NumberUtils.toFloat(value, 1.1f)

64. Convert the character string to a value of the int type. If the conversion fails, 0 isreturned.Expression: NumberUtils.toInt(value)

65. Convert the character string to a value of the int type. If the conversion fails, thespecified value, for example, 1, is returned.Expression: NumberUtils.toInt(value, 1)

66. Convert the character string to a value of the long type. If the conversion fails, 0 isreturned.Expression: NumberUtils.toLong(value)

67. Convert the character string to a value of the long type. If the conversion fails, thespecified value, for example, 1L, is returned.Expression: NumberUtils.toLong(value, 1L)



68. Convert the character string to a value of the short type. If the conversion fails, 0 isreturned.Expression: NumberUtils.toShort(value)

69. Convert the character string to a value of the short type. If the conversion fails, thespecified value, for example, 1, is returned.Expression: NumberUtils.toShort(value, 1)

70. Convert the IP string to a value of the long type, for example, convert 10.78.124.0 to172915712.Expression: CommonUtils.ipToLong(value)

71. Read an IP address and physical address mapping file from the network, and downloadthe mapping file to the map collection. url indicates the address for storing the IPmapping file, for example, http://10.114.205.45:21203/sqoop/IpList.csv.Expression: HttpsUtils.downloadMap("url")

72. Cache the IP address and physical address mappings and specify a key for retrieval, forexample, ipList.Expression: CommonUtils.setCache("ipList",HttpsUtils.downloadMap("url"))

73. Obtain the cached IP address and physical address mappings.Expression: CommonUtils.getCache("ipList")

74. Check whether the IP address and physical address mappings are cached.Expression: CommonUtils.cacheExists("ipList")

75. Obtain the physical addresses corresponding to the IP address inCountry_Province_City_Carrier format. For example, the physical addresscorresponding to 1xx.78.124.0 is China_Guangdong_Shenzhen_China Telecom. If thecorresponding physical address cannot be obtained, the default value **_**_**_** isreturned. If necessary, you can use the StringUtil class expression to further split theaddresses.Expression:CommonUtils.getMapValue(CommonUtils.ipToLong(value),CommonUtils.cacheExists("ipLis")? CommonUtils.getCache("ipLis"):CommonUtils.setCache("ipLis",HttpsUtils.downloadMap("url")))

8.8 What Data Formats Are Supported When the DataSource Is Hive?

CDM can read and write data in SequenceFile, TextFile, ORC, or Parquet format from theHive data source.

8.9 Does CDM Support Job Synchronization BetweenDifferent Clusters?

CDM does not support direct job migration across clusters. However, you can use the batchjob import/export function to indirectly implement cross-cluster migration as follows:

1. Export all jobs from CDM cluster 1 and save the jobs' JSON files to a local PC.For security purposes, no link password is exported when CDM exports jobs. Allpasswords are replaced by Add password here.



2. Edit each JSON file on the local PC by replacing Add password here with the actualpassword of the corresponding link.

3. Import the edited JSON files to CDM cluster 2 in batches to implement job migrationbetween cluster 1 and cluster 2.

For details about how to export and import data in batches, see Batch Managing Jobs.

8.10 Can I Create Jobs in Batches on CDM?CDM supports batch job creation with the help of the batch import function. You can createjobs in batches as follows.

1. Create a job manually.2. Export the job and save the job's JSON file to a local PC.3. Edit the JSON file and replicate more jobs in the JSON file according to the job

configuration.4. Import the JSON file to the CDM cluster to implement batch job creation.

For details about how to export and import data in batches, see Batch Managing Jobs.

8.11 Can I Back Up Jobs When the CDM Cluster Is NotUsed for a Long Time?

Yes. If you do not need to use the CDM cluster for a long time, you can stop or delete it toreduce costs.

Before the deletion, you can use the batch export function of CDM to save all job scripts to alocal PC. Then, you can create a cluster and import the jobs again when necessary.

8.12 How Do I Use Java to Invoke CDM RESTful APIs toCreate Data Migration Jobs?

CDM provides RESTful APIs to implement automatic job creation or execution control byprogram invocation.

The following describes how to use CDM to migrate data from table city1 in the MySQLdatabase to table city2 on DWS, and how to use Java to invoke CDM RESTful APIs to create,start, query, and delete a CDM job.

Prepare the following data in advance:

1. Obtain the username, account name, and project ID on HUAWEI CLOUD.On the CDM management console, hover the cursor on the username and select MyCredential from the drop-down list. On the page that is displayed, obtain the usernameand account name. In the project list, obtain the Project ID of the corresponding region,for example, 1af30ca47b5a4eb987e325a846458b7a.

2. Create a CDM cluster and obtain the cluster ID.

On the Cluster Management page, click on the left of the CDM cluster name toobtain the cluster ID, for example, c110beff-0f11-4e75-8b10-da7cd882b0ef.



3. Create a MySQL database and a DWS database, and create tables city1 and city2. Thestatements for creating tables are as follows:MySQL:create table city1(code varchar(10),name varchar(32));insert into city1 values('sz','Shenzhen');DWS:create table city2(code varchar(10),name varchar(32));

4. In the CDM cluster, create a link to MySQL, such as a link named mysqltestlink. Createa link to DWS, such as a link named dwstestlink.

5. Run the following code. You are advised to use the HttpClient package of version 4.5.Maven configuration is as follows:<project><modelVersion>4.0.0</modelVersion><groupId>cdm</groupId><artifactId>cdm-client</artifactId><version>1</version><dependencies><dependency><groupId>org.apache.httpcomponents</groupId><artifactId>httpclient</artifactId><version>4.5</version></dependency></dependencies></project>

Sample CodeThe code for using Java to invoke CDM RESTful APIs to create, start, query, and delete aCDM job is as follows:

package cdmclient;import java.io.IOException;import org.apache.http.Header;import org.apache.http.HttpEntity;import org.apache.http.HttpHost;import org.apache.http.auth.AuthScope;import org.apache.http.auth.UsernamePasswordCredentials;import org.apache.http.client.CredentialsProvider;import org.apache.http.client.config.RequestConfig;import org.apache.http.client.methods.CloseableHttpResponse;import org.apache.http.client.methods.HttpDelete;import org.apache.http.client.methods.HttpGet;import org.apache.http.client.methods.HttpPost;import org.apache.http.client.methods.HttpPut;import org.apache.http.entity.StringEntity;import org.apache.http.impl.client.BasicCredentialsProvider;import org.apache.http.impl.client.CloseableHttpClient;import org.apache.http.impl.client.HttpClients;import org.apache.http.util.EntityUtils;public class CdmClient {private final static String DOMAIN_NAME="HUAWEI CLOUD account name";private final static String USER_NAME="HUAWEI CLOUD username";private final static String USER_PASSWORD="HUAWEI CLOUD password";private final static String PROJECT_ID="Project ID";private final static String CLUSTER_ID="CDM Cluster ID";private final static String JOB_NAME="Job Name";private final static String FROM_LINKNAME="Source Link Name";private final static String TO_LINKNAME="Destination Link Name";private final static String IAM_ENDPOINT="iam.cn-north-1.myhuaweicloud.com";



private final static String CDM_ENDPOINT="cdm.cn-north-1.myhuaweicloud.com";private CloseableHttpClient httpclient;private String token;

public CdmClient() {this.httpclient = createHttpClient();this.token = login();}

private CloseableHttpClient createHttpClient() {CloseableHttpClient httpclient =HttpClients.createDefault();return httpclient;}

private String login(){HttpPost httpPost = new HttpPost("https://"+IAM_ENDPOINT+"/v3/auth/tokens");String json ="{\r\n"+"\"auth\": {\r\n"+"\"identity\": {\r\n"+"\"methods\": [\"password\"],\r\n"+"\"password\": {\r\n"+"\"user\": {\r\n"+"\"name\": \""+USER_NAME+"\",\r\n"+"\"password\": \""+USER_PASSWORD+"\",\r\n"+"\"domain\": {\r\n"+"\"name\": \""+DOMAIN_NAME+"\"\r\n"+"}\r\n"+"}\r\n"+"}\r\n"+"},\r\n"+"\"scope\": {\r\n"+"\"project\": {\r\n"+"\"name\": \"cn-north-1\"\r\n"+"}\r\n"+"}\r\n"+"}\r\n"+"}\r\n";try {StringEntity s = new StringEntity(json);s.setContentEncoding("UTF-8");s.setContentType("application/json");httpPost.setEntity(s);CloseableHttpResponse response = httpclient.execute(httpPost);Header tokenHeader = response.getFirstHeader("X-Subject-Token");String token = tokenHeader.getValue();System.out.println("Login successful");return token;} catch (Exception e) {throw new RuntimeException("login failed.", e);}}/*Create a job.*/

public void createJob(){HttpPost httpPost = new HttpPost("https://"+CDM_ENDPOINT+"/cdm/v1.0/"+PROJECT_ID+"/clusters/"+CLUSTER_ID+"/cdm/job");

/**The JSON information here is complex. You can create a job on the job management page, click Job JSON Definition next to the job, copy the



JSON content and convert it into a Java character string, and paste it here.*In the JSON message body, you only need to replace the link name, data import and export table names, field list of the tables, and fields used for partitioning in the source table.**/

String json ="{\r\n"+"\"jobs\": [\r\n"+"{\r\n"+"\"from-connector-name\": \"generic-jdbc-connector\",\r\n"+"\"name\": \""+JOB_NAME+"\",\r\n"+"\"to-connector-name\": \"generic-jdbc-connector\",\r\n"+"\"driver-config-values\": {\r\n"+"\"configs\": [\r\n"+"{\r\n"+"\"inputs\": [\r\n"+"{\r\n"+"\"name\": \"throttlingConfig.numExtractors\",\r\n"+"\"value\": \"1\"\r\n"+"}\r\n"+"],\r\n"+"\"validators\": [],\r\n"+"\"type\": \"JOB\",\r\n"+"\"id\": 30,\r\n"+"\"name\": \"throttlingConfig\"\r\n"+"}\r\n"+"]\r\n"+"},\r\n"+"\"from-link-name\": \""+FROM_LINKNAME+"\",\r\n"+"\"from-config-values\": {\r\n"+"\"configs\": [\r\n"+"{\r\n"+"\"inputs\": [\r\n"+"{\r\n"+"\"name\": \"fromJobConfig.schemaName\",\r\n"+"\"value\": \"sqoop\"\r\n"+"},\r\n"+"{\r\n"+"\"name\": \"fromJobConfig.tableName\",\r\n"+"\"value\": \"city1\"\r\n"+"},\r\n"+"{\r\n"+"\"name\": \"fromJobConfig.columnList\",\r\n"+"\"value\": \"code&name\"\r\n"+"},\r\n"+"{\r\n"+"\"name\": \"fromJobConfig.partitionColumn\",\r\n"+"\"value\": \"code\"\r\n"+"}\r\n"+"],\r\n"+"\"validators\": [],\r\n"+"\"type\": \"JOB\",\r\n"+"\"id\": 7,\r\n"+"\"name\": \"fromJobConfig\"\r\n"+"}\r\n"+"]\r\n"+"},\r\n"+"\"to-link-name\": \""+TO_LINKNAME+"\",\r\n"+"\"to-config-values\": {\r\n"+"\"configs\": [\r\n"+"{\r\n"+



"\"inputs\": [\r\n"+"{\r\n"+"\"name\": \"toJobConfig.schemaName\",\r\n"+"\"value\": \"sqoop\"\r\n"+"},\r\n"+"{\r\n"+"\"name\": \"toJobConfig.tableName\",\r\n"+"\"value\": \"city2\"\r\n"+"},\r\n"+"{\r\n"+"\"name\": \"toJobConfig.columnList\",\r\n"+"\"value\": \"code&name\"\r\n"+"}, \r\n"+"{\r\n"+"\"name\": \"toJobConfig.shouldClearTable\",\r\n"+"\"value\": \"true\"\r\n"+"}\r\n"+"],\r\n"+"\"validators\": [],\r\n"+"\"type\": \"JOB\",\r\n"+"\"id\": 9,\r\n"+"\"name\": \"toJobConfig\"\r\n"+"}\r\n"+"]\r\n"+"}\r\n"+"}\r\n"+"]\r\n"+"}\r\n";try {StringEntity s = new StringEntity(json);s.setContentEncoding("UTF-8");s.setContentType("application/json");httpPost.setEntity(s);httpPost.addHeader("X-Auth-Token", this.token);httpPost.addHeader("X-Language", "zh-cn");CloseableHttpResponse response = httpclient.execute(httpPost);int status = response.getStatusLine().getStatusCode();if(status == 200){System.out.println("Create job successful.");}else{System.out.println("Create job failed.");HttpEntity entity = response.getEntity();System.out.println(EntityUtils.toString(entity));}} catch (Exception e) {e.printStackTrace();throw new RuntimeException("Create job failed.", e);}}/*Start the job.*/

public void startJob(){HttpPut httpPut = new HttpPut("https://"+CDM_ENDPOINT+"/cdm/v1.0/"+PROJECT_ID+"/clusters/"+CLUSTER_ID+"/cdm/job/"+JOB_NAME+"/start");String json = "";try {StringEntity s = new StringEntity(json);s.setContentEncoding("UTF-8");s.setContentType("application/json");httpPut.setEntity(s);httpPut.addHeader("X-Auth-Token", this.token);httpPut.addHeader("X-Language", "zh-cn");



CloseableHttpResponse response = httpclient.execute(httpPut);int status = response.getStatusLine().getStatusCode();if(status == 200){System.out.println("Start job successful.");}else{System.out.println("Start job failed.");HttpEntity entity = response.getEntity();System.out.println(EntityUtils.toString(entity));}} catch (Exception e) {e.printStackTrace();throw new RuntimeException("Start job failed.", e);}}/*Query the job running status cyclically until the job is complete.*/

public void getJobStatus(){HttpGet httpGet = new HttpGet("https://"+CDM_ENDPOINT+"/cdm/v1.0/"+PROJECT_ID+"/clusters/"+CLUSTER_ID+"/cdm/job/"+JOB_NAME+"/status");try {httpGet.addHeader("X-Auth-Token", this.token);httpGet.addHeader("X-Language", "zh-cn");boolean flag = true;while(flag){CloseableHttpResponse response = httpclient.execute(httpGet);int status = response.getStatusLine().getStatusCode();if(status == 200){HttpEntity entity = response.getEntity();String msg = EntityUtils.toString(entity);if(msg.contains("\"status\":\"SUCCEEDED\"")){System.out.println("Job succeeded");break;}else if (msg.contains("\"status\":\"FAILED\"")){System.out.println("Job failed.");break;}else{Thread.sleep(1000);}

}else{System.out.println("Get job status failed.");HttpEntity entity = response.getEntity();System.out.println(EntityUtils.toString(entity));break;}}} catch (Exception e) {e.printStackTrace();throw new RuntimeException("Get job status failed.", e);}}/*Delete the job.*/

public void deleteJob(){HttpDelete httpDelte = new HttpDelete("https://"+CDM_ENDPOINT+"/cdm/v1.0/"+PROJECT_ID+"/clusters/"+CLUSTER_ID+"/cdm/job/"+JOB_NAME);try {httpDelte.addHeader("X-Auth-Token", this.token);httpDelte.addHeader("X-Language", "zh-cn");CloseableHttpResponse response = httpclient.execute(httpDelte);int status = response.getStatusLine().getStatusCode();



if(status == 200){System.out.println("Delete job successful.");}else{System.out.println("Delete job failed.");HttpEntity entity = response.getEntity();System.out.println(EntityUtils.toString(entity));}} catch (Exception e) {e.printStackTrace();throw new RuntimeException("Delete job failed.", e);}}/*Close the process.*/

public void close(){try {httpclient.close();} catch (IOException e) {throw new RuntimeException("Close failed.", e);}}

public static void main(String[] args){CdmClient cdmClient = new CdmClient();cdmClient.createJob();cdmClient.startJob();cdmClient.getJobStatus();cdmClient.deleteJob();cdmClient.close();}}

8.13 How Do I Connect On-premises Intranet or Third-Party Private Network to CDM?

Many enterprises deploy key data sources on the intranet, such as databases and file servers.CDM runs on HUAWEI CLOUD. To migrate the intranet data to HUAWEI CLOUD usingCDM, use any of the following methods to connect the intranet to HUAWEI CLOUD:

1. Bind the Internet IP addresses to the intranet data source nodes to enable CDM to accessthe data from the Internet directly.

2. Establish a VPN between the on-premises data center and the VPC where the serviceresides on HUAWEI CLOUD.For details about VPN on HUAWEI CLOUD, see http://www.huaweicloud.com/en-us/product/vpn.html.

3. Use Direct Connect to connect the data center to HUAWEI CLOUD.For details about Direct Connect on HUAWEI CLOUD, see http://www.huaweicloud.com/en-us/product/dc.html.

4. Leverage Network Address Translation (NAT) or port forwarding to access the networkin proxy mode.

The following describes how to use the port forwarding tool to access intranet data. Theprocess is as follows:

1. Use a Windows computer as the gateway. The computer must be able to access both theInternet and the intranet.



http://www.huaweicloud.com/en-us/product/vpn.html

http://www.huaweicloud.com/en-us/product/vpn.html

http://www.huaweicloud.com/en-us/product/dc.html

http://www.huaweicloud.com/en-us/product/dc.html

2. Install the port mapping tool IPOP on the computer.3. Configure port mapping using the tool.

NOTICEIf the intranet database is exposed to the public network for a long time, security risks exist.Therefore, after data migration is complete, stop port mapping.

Scenario

Suppose that the MySQL database on the intranet is migrated to DWS on HUAWEI CLOUD.Figure 8-10 shows a network topology example.

In the figure, the intranet can be either an enterprise's data center or the intranet of the virtualdata center on a third-party cloud.

Figure 8-10 Network topology example

Procedure

Step 1 Use a Windows computer as the gateway. Configure both the intranet and Internet IPaddresses on the computer. Conduct the following test to check whether the gatewaycomputer can fulfill service needs.

1. Run the ping command on the computer to check whether the intranet address of theMySQL database is pingable. For example, run ping 192.168.1.8.

2. Run the ping command on another computer that can access the Internet to checkwhether the public network address of the gateway computer is pingable. For example,run ping 202.xxx.xxx.10.

Step 2 Download the port mapping tool IPOP and install it on the gateway computer.

Step 3 Run the port mapping tool and select PORT Map. See Figure 8-11.l Local IP and Local Port: Configure these two parameters to the public network address

and port number of the gateway computer respectively, which must be entered whencreating MySQL links on CDM.

l Mapping IP and Map Port: Configure these two parameters to the IP address and portnumber of the MySQL database on the intranet.



Figure 8-11 Configuring port mapping

Step 4 Click ADD to add a port mapping relationship.

Step 5 Click START to start mapping and receive data packets.

Then, you can use the EIP to read data from the MySQL database on the intranet on CDM andimport the data to DWS on HUAWEI CLOUD.

NOTE

1. To access the on-premises data source, you must also bind an EIP to the CDM cluster.

2. Generally, DWS on HUAWEI CLOUD can be accessed only within the VPC. When creating a CDMcluster, you must ensure that the VPC of the CDM cluster must be the same as that of DWS. Inaddition, it is recommended that CDM and DWS be in the same intranet and security group. If theirsecurity groups are different, you also need to enable data access between the security groups.

3. Port mapping can be used to migrate data between databases on the intranet or the SFTP servers.

4. For Linux computers, port mapping can also be implemented using IPTABLE.

5. When the FTP server on the intranet is mapped to the public network using port mapping, you needto check whether the PASV mode is enabled. In this case, the client and server are connectedthrough a random port. Therefore, in addition to port 21 mapping, you also need to configure theport range mapping in PASV mode. For example, you can specify the vsftp port range byconfiguring pasv_min_port and pasv_max_port.

----End

8.14 What Do I Do If the System Displays a MessageIndicating that the Date Format Fails to Be Parsed WhenData Is Imported to Cloud Search Service?

Symptom

When CDM is used to migrate other data sources to Cloud Search Service, the job fails to beexecuted and the error message "Unparseable date" is displayed in the log. See Figure 8-12.



Figure 8-12 Log output

Possible Cause

Cloud Search Service has a special processing mechanism on the time field. If the stored timedata does not contain the time zone information, Kibana considers the time as the GMT andautomatically converts the time to the local time.

In China, the displayed time is eight hours earlier than the actual time. Therefore, when CDMmigrates data to Cloud Search Service, if the index and type are automatically created byCDM (for example, if date_test and test1 of the migration destination highlighted in Figure8-13 do not exist in Cloud Search Service, CDM automatically creates the index and type inCloud Search Service), CDM, by default, sets the format of the time field to the standardformat of yyyy-MM-dd HH:mm:ss.SSS Z, for example, 2018-01-08 08:08:08.666 +0800.

Figure 8-13 Job configuration

When data is imported from another data source to Cloud Search Service, if the date format inthe source data is not the standard format, for example, 2018/01/05 15:15:46, the CDM jobfails to be executed, and the log shows that the date format cannot be parsed. You need toconfigure a field converter on CDM to convert the format of the date field to the requiredformat of Cloud Search Service.

Solution1. Edit the job and go to the Map Field tab page. Click the icon for creating a converter in

the row of the source field to create a converter. See Figure 8-14.



Figure 8-14 Creating a converter

2. Select Expression conversion as the converter. Currently, expression conversionsupports functions of the character string and date types. The syntax is similar to the Javacharacter string and time functions. For details about how to compile the expression, seeField Conversion During Migration.

3. In this example, the source time format is yyyy/MM/dd HH:mm:ss. To convert the sourcetime format to yyyy-MM-dd HH:mm:ss.SSS Z, perform the following operations:

a. Add the time zone information +0800 to the end of the original date characterstring. The corresponding expression is value+" +0800".

b. Use the original date format to parse the string to a date object. You can use theDateUtils.parseDate function for parsing. The syntax isDateUtils.parseDate(String value, String format).

c. Format the date object into a character string in target format by using theDateUtils.format function. The syntax is DateUtils.format(Date date, Stringformat).

In this example, the complete expression isDateUtils.format(DateUtils.parseDate(value+" +0800","yyyy/MM/dd HH:mm:ssZ"),"yyyy-MM-dd HH:mm:ss.SSS Z"). See Figure 8-15.

Figure 8-15 Configuring the expression



http://support.huaweicloud.com/en-us/usermanual-cdm/cdm_01_0090.html

4. Save the converter configuration and save and run the job to solve the problem thatCloud Search Service fails to parse the date format.

8.15 What Do I Do If the Map Field Tab Page CannotDisplay All Columns When Data Is Exported from HBase/CloudTable?

Symptom

When data is exported from HBase/CloudTable using CDM, fields in the HBase/CloudTabletable on the Map Field tab page occasionally cannot be displayed completely and cannotmatch the fields on the migration destination. As a result, the data imported to the migrationdestination is incomplete.

Possible Cause

HBase/CloudTable are schema-less, and the number of columns in each data is not fixed. Onthe Map Field page, there is a high probability that all columns cannot be obtained byobtaining example values. In this case, the data on the migration destination is incompleteafter the job is executed.

To solve this problem, perform any of the following methods:

1. Add fields on the Map Field tab page.2. Edit the JSON file of the job on the Job Management page (modify the

fromJobConfig.columns and toJobConfig.columnList parameters).3. Export the JSON file of the job to the local PC, modify the parameters in the JSON file

(the principle is the same to that in 2), and then import the JSON file back to CDM.

You are advised to perform 1. The following uses data migration from HBase to DWS as anexample.

Solution 1: Adding Fields on the Map Field Tab Page1. Obtain all fields in the tables to be migrated from source HBase. Use colons (:) to

separate column families and columns. The following gives an example.rowkey:rowkeyg:DAY_COUNTg:CATEGORY_IDg:CATEGORY_NAMEg:FIND_TIMEg:UPLOAD_PEOPLEg:IDg:INFOMATION_IDg:TITLEg:COORDINATE_Xg:COORDINATE_Yg:COORDINATE_Zg:CONTENTg:IMAGESg:STATE

2. On the Job Management page, locate the job for exporting data from HBase to DWS,click Edit in the row where the job resides, and go to the Map Field tab page. SeeFigure 8-16.




3. Click . In the dialog box that is displayed, select Add a new field. See Figure 8-17.

Figure 8-17 Adding a field

NOTE

After a field is added, the example value of the new field is not displayed on the console. Thisdoes not affect the transmission of field values. CDM directly writes the field values to themigration destination.

4. After all fields are added, check whether the mapping between the migration source anddestination is correct. If the mapping is incorrect, drag the fields to adjust the fieldmapping.

5. Click Next and Save.



Solution 2: Modifying a JSON File1. Obtain all fields in the tables to be migrated from source HBase. Use colons (:) to

separate column families and columns. The following gives an example.rowkey:rowkeyg:DAY_COUNTg:CATEGORY_IDg:CATEGORY_NAMEg:FIND_TIMEg:UPLOAD_PEOPLEg:IDg:INFOMATION_IDg:TITLEg:COORDINATE_Xg:COORDINATE_Yg:COORDINATE_Zg:CONTENTg:IMAGESg:STATE

2. In the DWS destination table, obtain the fields corresponding to the HBase table fields.If any field name corresponding to the HBase field does not exist in the DWS destinationtable, add it to the DWS table schema. Suppose that the fields in the DWS table arecomplete and are displayed as follows:rowkeyday_countcategorycategory_namefind_timeupload_peopleidinfomation_idtitlecoordinate_xcoordinate_ycoordinate_zcontentimagesstate

3. On the Job Management page, locate the job for exporting data from HBase to DWS,and choose More > Edit Job JSON in the row where the job resides.

4. On the page that is displayed, edit the JSON file of the job.

a. Modify the fromJobConfig.columns parameter of the migration source to theHBase fields obtained in 1. Use & to separate column numbers and colons (:) toseparate column families and columns. The following gives an example:"from-config-values": { "configs": [ { "inputs": [ { "name": "fromJobConfig.table", "value": "HBase" }, { "name": "fromJobConfig.columns", "value": "rowkey:rowkey&g:DAY_COUNT&g:CATEGORY_ID&g:CATEGORY_NAME&g:FIND_TIME&g:UPLOAD_PEOPLE&g:ID&g:INFOMATION_ID&g:TITLE&g:COORDINATE_X&g:COORDINATE_Y&g:COORDINATE_Z&g:CONTENT&g:IMAGES&g:STATE" }, { "name": "fromJobConfig.formats", "value": { "2": "yyyy-MM-dd", "undefined": "yyyy-MM-dd"



} } ], "name": "fromJobConfig" } ] }

b. Modify the toJobConfig.columnList parameter of the migration source to the fieldlist of DWS obtained in 2.The sequence must be the same as that of HBase to ensure correct field mapping.Use & to separate field names. The following gives an example:"to-config-values": { "configs": [ { "inputs": [ { "name": "toJobConfig.schemaName", "value": "dbadmin" }, { "name": "toJobConfig.tablePreparation", "value": "DO_NOTHING" }, { "name": "toJobConfig.tableName", "value": "DWS " }, { "name": "toJobConfig.columnList", "value": "rowkey&day_count&category&category_name&find_time&upload_people&id&infomation_id&title&coordinate_x&coordinate_y&coordinate_z&content&images&state" }, { "name": "toJobConfig.shouldClearTable", "value": "true" } ], "name": "toJobConfig" } ] }

c. Retain the settings of other parameters, and then click Save and Run.5. After the job is completed, check whether the data in the DWS table matches the data in

HBase. If the mapping is incorrect, check whether the sequences of the HBase and DWSfields in the JSON file are the same.

8.16 How Do I Select Distribution Columns When UsingCDM to Migrate Data to DWS?

When using CDM to migrate data to DWS/FusionInsight LibrA and create a table on DWS,select the distribution columns during job configuration. See Figure 8-18.



Figure 8-18 Selecting distribution columns

Selecting the distribution column is very important for the running of DWS/FusionInsightLibrA. When migrating data to DWS/FusionInsight LibrA, you are advised to specify thedistribution column according to the following principles:

1. Use the primary key as the distribution column.2. If multiple data segments are combined as primary keys, specify all primary keys as the

distribution column.3. In the scenario where no primary key is available, if no distribution column is selected,

DWS uses the first column as the distribution column by default. As a result, data skewrisks exist.

Therefore, when a single table or entire database is imported to DWS/FusionInsight LibrA,you are advised to manually select a distribution column; otherwise, CDM automaticallyselects one. For more information about the distribution column, see Selecting a DistributionColumn. in DWS

If the DWS primary key or table contains only one field, the field type must be a commoncharacter string, value, or date. When data is migrated from another database to DWS, ifautomatic table creation is selected, the primary key must be of the following types. If noprimary key is set, at least one of the following fields must be set. Otherwise, the table cannotbe created and the CDM job fails.

l INTEGER TYPES: TINYINT, SMALLINT, INT, BIGINT, NUMERIC/DECIMALl CHARACTER TYPES: CHAR, BPCHAR, VARCHAR, VARCHAR2, NVARCHAR2,

TEXTl DATA/TIME TYPES: DATE, TIME, TIMETZ, TIMESTAMP, TIMESTAMPTZ,

INTERVAL, SMALLDATETIME

8.17 What Do I Do If the Error Message "value too long fortype character varying" Is Displayed When I Migrate Datato DWS?

SymptomWhen you use CDM to migrate data to DWS/FusionInsight LibrA, the migration fails and theerror message "value too long for type character varying" is displayed in the log. SeeFigure 8-19.



https://support.huaweicloud.com/en-us/devg-dws/dws_04_0033.html

https://support.huaweicloud.com/en-us/devg-dws/dws_04_0033.html

Figure 8-19 Log output

Possible CauseThe data migrated to DWS is in Chinese, and the table is automatically created at themigration destination. The length of the varchar field of DWS is calculated by byte, and aChinese character may occupy three bytes in UTF-8 encoding. If the length of a Chinesecharacter exceeds that of the varchar field of DWS, an error occurs and the error message"value too long for type character varying" is displayed.

SolutionTo solve this problem, you can select Extend Field Length to Yes, so that the length of thevarchar field is automatically increased by three times when the destination table is created.

Edit the table/file migration job on CDM. In Destination Job Configuration, set Auto TableCreation to Auto creation, Extend Field Length is displayed in Show AdvancedAttributes. Set Extend Field Length to Yes. See Figure 8-20.



Figure 8-20 Extending field length



A Version Updates

2018.8.3 1.5.0 Versionl New Functions

a. Support for the cdm.xlarge cluster requiring 10GE bandwidth

b. Support for streaming JSON parse to reduce resource usage

c. Support for region switch on the CDM service purchase page to improve usability

d. Support for link connectivity test to improve usability

e. Support for source and destination table comparison after the migration is complete

f. Support for incremental data migration in MySQL Binlog mode (trial use)

l Fixed Bugs

a. Failures of data migration from DIS to Cloud Search Service if Offset is set to Laststop

b. Failures of job save when data is migrated from the Oracle database to DWS andthe source table contains more than 800 columns

c. Failures of setting field delimiter to \001 when exporting a CSV file

d. Failures of job execution when data is migrated from MySQL to DWS with autotable creation enabled and the source field is configured with the NOT NULLconstraints


a. Support for HDFS data migration between multiple MRS clusters

b. Support for data partition by size during data export to OBS

c. Support for data filtering using filter conditions during data migration fromElasticsearch/Cloud Search Service. This function can be used in incremental datamigration scenarios.

d. Support for object migration from Qiniu Cloud Object Storage to OBS

e. Support for the job statistics of running clusters being displayed on the CDMconsole

l Fixed Bugs

Cloud Data MigrationUser Guide A Version Updates


a. Changed the maximum length of a job name from 32 characters to 256 characters.b. Empty directories are not migrated to the migration destination.c. Special characters cannot be used as field delimiters.d. The default value of the time field at the migration source is not used when the

MySQL database automatically creates a table.


a. Support for data export from HTTP/HTTPS data sources to HUAWEI CLOUDb. Support for data to be exported in CarbonData format and stored in OBSc. Support for scheduled start and stop of clusters and automatic shutdown, helping

you reduce costsd. Support for automatic mapping of fields with the same namee. Support for the process wizardf. Support for the retry policy of migration jobsg. During MySQL link creation, local APIs can be automatically detected and enabled.h. Supported for the pipeline and authentication parameters being configured for the

Elasticsearch data sourcei. When the job displayed on the Job Management page fails, you can hover the

cursor on the job to see the failure cause.l Fixed Bugs

a. The monitoring data cannot be correctly displayed when a cluster is created for thefirst time.

b. Data cannot be imported to the Hive partitioned table.c. Buckets cannot be listed when OBS buckets of various regions exist.d. The expression carrying % cannot be written in the where clause in the database.


a. Support for data export and import of DDMb. Support for data export and import of Hadoop HBase and FusionInsight HBasec. Support for data export and import of FusionInsight LibrAd. Support for data migration from the MongoDB database to DDSe. Support for entire database migration to OBSf. Optimized the method of adjusting field mapping, making it easier to use.g. The JSON definition of a job can be edited, which is suitable for advanced users.h. Data can be imported to DWS in GDS mode, which greatly improves the

performance of importing data to DWS.i. Support for column- or row-based storage, as well as compressed storage, in DWS

table creationj. Support for the advanced attribute of deleting data successful import. This attribute

is designed for massive one-time jobs.



k. Support for the encircling symbol being configured for the CSV files in filemigration

l. Support for migration of files in ZIP formatm. Support for field converter test. That is, the conversion effect is displayed

immediately.n. Optimized the performance of executing a large number of migration jobs.o. Support for numeral fields being used as incremental fields in database migrations,

making incremental migration more convenientp. Support for databases being migrated in transaction mode by specifying a staging

tablel Fixed Bugs

a. Common error messagesb. NoSQL example values cannot contain all fields by adding the function of manually

adding new fields.c. SocketTimeout may occur when data is migrated from MongoDB.d. Poor performance of writing small files to OBSe. Poor performance of executing multiple concurrent jobs

2018.3.28 1.0.T11 Versionl New Functions

a. Support for data import to DLIb. Support for object migration from Alibaba Cloud OSSc. Support for KMS encryption when data is written to OBSd. Support for MD5 verification to ensure data consistency when data is written to

OBSe. Support for obtaining the MRS, DWS, and RDS instance lists during link creationf. Support for HUAWEI CLOUD second-generation VMs, speeding up network

accessg. Support for automatic schema creation during entire database migrationh. Accelerated the speed of creating a cluster for the first time. The creation is

complete within one minute.i. Added the expression converter to support more string, date, and numeric

processing functions.j. Support for the cdm.small clusters to reduce costsk. The cluster can be started or stopped based on service requirements.l. In file migrations, the total number of files and total data volume are displayed.m. You can view the monitoring metrics of the CDM cluster on the Cloud Eye console,

for example, data traffic.n. In CloudTable data migration, the time range and column families of data can be

specified.l Fixed Bugs

a. Inaccurate statistics about data written to DWSb. DCS link failure



c. JSON files cannot be exported to CSV files.d. Data import fails because some database field names contain spaces.


a. Support for wizard-based link creationb. Support for specifying the tables to be migrated during entire database migrationc. Support for data migration from Cloud Search Serviced. Support for data export and import of FusionInsight HDFSe. Support for regular expressions being used to parse logsf. Support for the time format conversion function and the random number function of

the field expression converterg. Support for reading files in GZIP formath. Support for reading files in Parquet format on HDFSi. Support for splitting rowkeys when data is migrated from HBase/CloudTablej. Support for determining whether to compress a HBase/CloudTable table during job

creationk. The new endpoint cdm.cn-north-1. myhuaweicloud.com is used.l. Support for data export from Derecho (GaussDB)m. Support for deleting the header row of a CSV file

l Fixed Bugs

a. Low performance of writing data to OBSb. Inconsistency between the entire database migration job status and the sub-job

statusc. Insufficient fields during table migrationd. EIPs cannot be deleted from CDM after being released in the VPC.e. Cloud Search Service does not support date fields.


a. Support for entire homogeneous relational database migration. You can migrate on-premises MySQL, PostgreSQL, and Microsoft SQL Server databases to RDS forMySQL, RDS for PostgreSQL, and RDS for SQL Server on HUAWEI CLOUD.This function is applicable to database migration to RDS on HUAWEI CLOUD. Itsupports entire database migration but does not support real-time incrementalsynchronization.

b. Support for entire heterogeneous relational database migration. You can migrate theon-premises Oracle, Db2, MySQL, PostgreSQL, or Microsoft SQL Server databaseto any database of RDS for MySQL, RDS for PostgreSQL, RDS for SQL Server,DWS, and MRS Hive.

c. Support for entire NoSQL database migration. You can migrate on-premises Redisand Elasticsearch to DCS and Cloud Search Service on HUAWEI CLOUD.

d. Support for automatic creation of the destination table during data import to adatabase



e. Support for migration from open source Hadoop to MRS on HUAWEI CLOUD andnon-security mode

f. Support for interconnection with CloudTable on HUAWEI CLOUD and opensource Kafka

g. Support for parsing source data files in JSON format

h. Support for connecting to RDS databases in SSL mode

i. Support for filtering jobs by status and scheduled execution

j. Support for writing a temporary name during file migration

k. Support for setting a specific file as the boot condition for file migration jobs, forexample, OK.txt

l. Support for batch link deletion

l Fixed Bugs

a. Changed the storage duration of historical records to 90 days.

b. Long database link timeout period

c. Changed names of the enumerated values to ones that are easier to understand.

d. Incorrect sorting of historical operation records displayed on multiple pages

e. Failures of creating jobs when a table contains a large number of fields


a. Support for the Elasticsearch/Cloud Search Service links. Data in the database canbe imported to the Elasticsearch server and Cloud Search Service.

b. Support for the DIS links. Data can be obtained from DIS.

c. Support for the NAS links, CIFS/SMB protocol, interconnection with professionalfile servers, Windows system file sharing, Linux Samba servers, and file systemcloud services that provide the CIFS/SMB protocol

d. Support for binding or unbinding an EIP after a cluster is created

e. Support for configuring field conversion and processing field values duringmigration

f. Optimized the Job Management page so that the job progress can be displayed in amore timely and accurate manner and jobs can be sorted by specific field.

g. Support for displaying historical records and links on multiple pages

h. Support for detecting duplicate files based on the file size during incremental filesynchronization

i. Support for scheduled job execution (weekly)

l Fixed Bugs

a. Incorrect database passwords due to special characters

b. Incorrect default date format of job mapping

c. Invalid advanced link parameters

d. Batch job import timeout




a. Support for directory browsing when selecting FTP, SFTP, HDFS, and OBS pathsb. Support for overwriting or skipping files with duplicate names during file data

import. By combining this function with scheduled job execution, incremental filemigration can be implemented.

c. Support for date and time variable functions dataformat and timestamp, whichcan be used in table names, Where clauses, and file paths. By combining thisfunction with scheduled job execution, incremental file migration can beimplemented.

d. Support for common date formats during field mapping configuratione. Optimized error code and messages.f. Support for cluster VM restart and graceful restart modesg. Support for copying field names from the migration source if an HBase table is

created during data import to HBaseh. Support for MongoDB

l Fixed Bugs

a. Job mapping pages of HBase and Redis jobsb. Failures of batch job startup and occasional startup failures of some jobs in batch

startupc. Occasional generation of empty directories when data is migrated from OBSd. Handle leakage of MySQL and FTP links

2017.9.30 Launched for Open Beta Test1. Launched CDM, which supports table data import and export among data sources like

FTP, SFTP, HDFS, OBS, HBase, Hive, DWS, MySQL, Oracle, Db2, PostgreSQL,Microsoft SQL Server, Redis, and VoltDB.

2. Support for wizard-based configuration of import and export jobs and concurrencypolicies

3. Support for using a VM as a service unit to implement security isolation4. Support for setting row and column separators in file data export and configuring the

regular expression for filtering and encoding types5. Support for scheduled job execution6. Optimized the performance of importing data to MySQL, DWS, HBase, and Hive.



B Change History

Release Date What's New

2018-08-03 This is the tenth official release.l Added the following sections:

– Migrating Data from OSS to OBS– Migrating Data from OBS to Cloud Search Service– Migrating the Entire Elasticsearch Database to Cloud

Search Service– File Formats

l Updated the screenshots.l Updated the operation procedures in Typical Scenarios.l Updated the description of most job parameters in Job

Management and added multiple job parameters.

2018-07-05 This is the ninth official release.l Added the following sections:

– CTS– Link to Qiniu Cloud Object Storage– Migrating Data from the MySQL Database to DDM

l Updated the screenshots.l Updated the parameter description in the following sections:

– Data Sources Supported by CDM– Creating a Link– Link to HDFS– Link to HBase– From Elasticsearch/Cloud Search Service– To OBS– To FTP/SFTP/NAS

Cloud Data MigrationUser Guide B Change History



2018-06-02 This is the eighth official release.l Added the following sections:

– From HTTP/HTTPS– Migrating Data from the MySQL Database to the MRS

Hive Partition Table– HBase/CloudTable Incremental Migration– GDS Import Mode– What Do I Do If the Error Message "value too long for

type character varying" Is Displayed When I Migrate Datato DWS?

l Updated the screenshots.l Updated the following sections because HTTP/HTTPS can be

used as the migration source:– Data Sources Supported by CDM– Creating a Link– Table/File Migration

l Updated the following sections because the automatic shutdownand scheduled power-on/off are supported:– Purchasing CDM– Creating a Cluster– Stopping, Starting, or Deleting a Cluster

l Updated the parameter description in the following sections:– Link to Elasticsearch– From OBS/OSS– From a Relational Database– To OBS




2018-05-04 This is the seventh official release.l Added the following sections:

– Monitoring– Link to HBase– Link to Hive– To DDS– Advanced Operations– What Is the Most Economical Way to Migrate Data from

the Public Network Using CDM?– How Do I Select Distribution Columns When Using CDM

to Migrate Data to DWS?l Updated the following sections:

– Data Sources Supported by CDM– Related Services– Constraints– Purchasing CDM– Creating and Executing a Job– Creating a Cluster– Creating a Link– Table/File Migration– Entire DB Migration– From a Relational Database– Managing a Single Job

l Updated the screenshots.l Changed Elasticsearch Service (ES) to Cloud Search Service.l Changed Unlimited Query Service (UQuery) to Data Lake Insight

(DLI).l Changed Data Pipeline Service (DPS) to Data Lake Factory

(DLF).




2018-04-09 This is the sixth official release.l Added the following sections:

– Link to OSS on Alibaba Cloud– Link to DLI– To DLI– Migrating Data from OBS to DLI– What Do I Do If the System Displays a Message Indicating

that the Date Format Fails to Be Parsed When Data IsImported to Cloud Search Service?

– What Do I Do If the Map Field Tab Page Cannot DisplayAll Columns When Data Is Exported from HBase/CloudTable?

l Updated the following sections:– Data Sources Supported by CDM– Related Services– Constraints– CDM Billing– Purchasing CDM– Creating a Cluster– Stopping, Starting, or Deleting a Cluster– From OBS/OSS– From HBase/CloudTable– Field Conversion During Migration

l Updated the procedure for binding EIPs because the EIPs are notautomatically bound.

l Updated the screenshots.




2018-01-31 This is the fifth official release.l Added the following sections:

– From Elasticsearch/Cloud Search Service– Using Regular Expressions to Separate Semi-structured

Text– Migrating the Entire MySQL Database to RDS

l Updated the data sources supported in table/file migration in DataSources Supported by CDM.

l Added the JS expression example in Field Conversion DuringMigration.

l Updated job parameters, and modified Source Job Parametersand Destination Job Parameters.

l Added the description of selecting a connector in the first step inthe procedure for creating a link.

l Deleted the following sections:– From VoltDB– To VoltDB– Using CDM to Archive MySQL Data to OBS– Creating the PostgreSQL Link on RDS on HUAWEI CLOUD

2018-01-11 This is the fourth official release.l Added the following sections:

– Data Sources Supported by CDM– Link to HDFS– Link to CloudTable– Link to Kafka– Entire DB Migration– From Apache Kafka– Migrating Data from Oracle to Cloud Search Service– Version Updates

l Modified several connector parameters, job parameters, andcorresponding parameter descriptions.

l Modified "Procedure" in Creating and Executing a Job.




2017-11-30 This is the third official release.l Added the following sections:

– Binding or Unbinding an EIP– Link to a NAS Server– Link to DIS– Link to Elasticsearch– From DIS– To Elasticsearch/Cloud Search Service– Field Conversion During Migration– Typical Scenarios

l Changed all connector names by deleting connector from thenames in the document.

l Modified content in Scheduling Job Execution.

2017-10-31 This is the second official release.l Added Link to MongoDB/DDS.l Added Scheduling Job Execution.l Added Incremental Synchronization Using the Macro

Variables of Date and Time.l Modified the parameter description of the source job

configuration and destination job configuration, and enabled thedirectory, table name, and Where clause to be configured as timemacro variables.

l Modified the data source list supported by CDM, added theMongoDB data source, and added several data migrationscenarios.

2017-09-30 This is the first official release.



User Guide · 2019-07-18 · Cloud Data Migration (CDM) enables data migration among various data...

Documents

Transcript of User Guide · 2019-07-18 · Cloud Data Migration (CDM) enables data migration among various data...