alibaba/DataX

mysql2elasticsearch 不管怎么配置setting,job都只会被切分为1个task,求解答

Open

#1,568 opened on Oct 26, 2022

View on GitHub
 (2 comments) (0 reactions) (0 assignees)Java (5,230 forks)batch import
help wanted

Repository metrics

Stars
 (15,048 stars)
PR merge metrics
 (No merged PRs in 30d)

Description

我设置了channel,也配置了splitPK(bigint类型的主键),都只会有1个task

{
        "content":[
                {
                        "reader":{
                                "name":"mysqlreader",
                                "parameter":{
                                        "column":[
                                                "id",
                                                "label1",
                                                "label2"
                                        ],
                                        "connection":[
                                                {
                                                        "jdbcUrl":[
                                                                "jdbc:mysql://10.20.*.*:33061/db_table?useUnicode=true&characterEncoding=utf-8"
                                                        ],
                                                        "table":[
                                                                "mysql2es"
                                                        ]
                                                }
                                        ],
                                        "password":"***********",
                                        "splitPK":"id",
                                        "username":"root"
                                }
                        },
                        "writer":{
                                "name":"elasticsearchwriter",
                                "parameter":{
                                        "accessId":"elastic",
                                        "accessKey":"***********",
                                        "batchSize":1000,
                                        "cleanup":true,
                                        "column":[
                                                {
                                                        "name":"id",
                                                        "type":"long"
                                                },
                                                {
                                                        "name":"label1",
                                                        "type":"integer"
                                                },
                                                {
                                                        "name":"label2",
                                                        "type":"integer"
                                                }
                                        ],
                                        "discovery":false,
                                        "dynamic":false,
                                        "endpoint":"http://10.20.*.*:9400",
                                        "index":"mysql2es",
                                        "settings":{
                                                "index":{
                                                        "number_of_replicas":1,
                                                        "number_of_shards":3,
                                                        "refresh_interval":"10s"
                                                }
                                        },
                                        "type":"_doc"
                                }
                        }
                }
        ],
        "setting":{
                "speed":{
                        "channel":3
                }
        }
}

2022-10-26 10:43:01.565 [job-0] INFO  ElasticSearchWriter$Job - unified version: 1666752181565
2022-10-26 10:43:01.570 [job-0] INFO  ElasticSearchWriter$Job - [{"array":false,"combineFieldsValueSeparator":"-","dstArray":false,"jsonArray":false,"name":"id","origin":false,"type":"long"},{"array":false,"combineFieldsValueSeparator":"-","dstArray":false,"jsonArray":false,"name":"label1","origin":false,"type":"integer"},{"array":false,"combineFieldsValueSeparator":"-","dstArray":false,"jsonArray":false,"name":"label2","origin":false,"type":"integer"}]
2022-10-26 10:43:01.570 [job-0] INFO  ElasticSearchWriter$Job - index:[mysql2es], type:[_doc], mappings:[{"properties":{"id":{"type":"long"},"label1":{"type":"integer"},"label2":{"type":"integer"}}}]
2022-10-26 10:43:01.577 [job-0] INFO  ElasticSearchClient - begin GetMapping for index: mysql2es
2022-10-26 10:43:01.581 [job-0] INFO  ElasticSearchWriter$Job - the mappings for old index is: {"id":{"type":"long"},"label1":{"type":"integer"},"label2":{"type":"integer"}}
2022-10-26 10:43:01.582 [job-0] INFO  ElasticSearchClient - begin GetSettings for index: mysql2es
2022-10-26 10:43:01.586 [job-0] INFO  ElasticSearchWriter$Job - merge1 settings:{"mysql2es":{"settings":{"index":{"routing":{"allocation":{"include":{"_tier_preference":"data_content"}}},"refresh_interval":"10s","number_of_shards":"3","provided_name":"mysql2es","creation_date":"1666751769822","number_of_replicas":"1","uuid":"LjMiDqIlR0-vUKlnCNNZdA","version":{"created":"7120099"}}}}}, settingsCache:null, includeSettings:{"number_of_replicas":"1","number_of_shards":"3"}
2022-10-26 10:43:01.587 [job-0] INFO  ElasticSearchClient - delete index mysql2es
2022-10-26 10:43:01.891 [job-0] INFO  ElasticSearchClient - delete index mysql2es success
2022-10-26 10:43:01.891 [job-0] INFO  ElasticSearchWriter$Job - merge2 settings:{"index":{"number_of_replicas":1,"number_of_shards":3,"refresh_interval":"10s"}}, settingsCache:{"index":{"number_of_replicas":1,"number_of_shards":3,"refresh_interval":"10s"},"number_of_replicas":"1","number_of_shards":"3"}
2022-10-26 10:43:01.894 [job-0] WARN  ElasticSearchClient - null
2022-10-26 10:43:01.894 [job-0] WARN  ElasticSearchClient - IndicesExists got ResponseCode: 404 ErrorMessage: 404 Not Found
2022-10-26 10:43:01.894 [job-0] INFO  ElasticSearchClient - create index mysql2es
2022-10-26 10:43:02.060 [job-0] INFO  ElasticSearchClient - create mysql2es index success
2022-10-26 10:43:02.061 [job-0] INFO  ElasticSearchClient - create mappings for mysql2es  {"properties":{"id":{"type":"long"},"label1":{"type":"integer"},"label2":{"type":"integer"}}}
2022-10-26 10:43:02.110 [job-0] INFO  ElasticSearchClient - index mysql2es put mappings success
2022-10-26 10:43:02.111 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2022-10-26 10:43:02.111 [job-0] INFO  JobContainer - Job set Channel-Number to 3 channels.
2022-10-26 10:43:02.115 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] splits to [1] tasks.
2022-10-26 10:43:02.116 [job-0] INFO  JobContainer - DataX Writer.Job [elasticsearchwriter] splits to [1] tasks.
2022-10-26 10:43:02.125 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2022-10-26 10:43:02.128 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2022-10-26 10:43:02.130 [job-0] INFO  JobContainer - Running by standalone Mode.
2022-10-26 10:43:02.135 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2022-10-26 10:43:02.138 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2022-10-26 10:43:02.138 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2022-10-26 10:43:02.150 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2022-10-26 10:43:02.152 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Begin to read record by Sql: [select id,label1,label2 from mysql2es 
] jdbcUrl:[jdbc:mysql://10.20.35.85:33061/sx_dmp_jres?useUnicode=true&characterEncoding=utf-8&yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
2022-10-26 10:43:02.155 [0-0-0-writer] INFO  ElasticSearchWriter$Job - columnList: [{"array":false,"combineFieldsValueSeparator":"-","dstArray":false,"jsonArray":false,"name":"id","origin":false,"type":"long"},{"array":false,"combineFieldsValueSeparator":"-","dstArray":false,"jsonArray":false,"name":"label1","origin":false,"type":"integer"},{"array":false,"combineFieldsValueSeparator":"-","dstArray":false,"jsonArray":false,"name":"label2","origin":false,"type":"integer"}]
2022-10-26 10:43:02.155 [0-0-0-writer] INFO  ElasticSearchWriter$Job - Task will use elasticsearch auto generated _id property
2022-10-26 10:43:02.156 [0-0-0-writer] INFO  AbstractJestClient - Setting server pool to a list of 1 servers: [http://10.20.32.117:9400]
2022-10-26 10:43:02.157 [0-0-0-writer] INFO  JestClientFactory - Using multi thread/connection supporting pooling connection manager
2022-10-26 10:43:02.158 [0-0-0-writer] INFO  JestClientFactory - Using default GSON instance
2022-10-26 10:43:02.158 [0-0-0-writer] INFO  JestClientFactory - Node Discovery disabled...
2022-10-26 10:43:02.158 [0-0-0-writer] INFO  JestClientFactory - Idle connection reaping disabled...

Contributor guide