Sqoop导入Oracle数据表到Hive

2022-05-06 20:28:08 浏览数 (1)

1、问题1

代码语言:javascript复制
[root@node1 sqoop-1.4.7]# bin/sqoop import --connect jdbc:oracle:thin:@192.168.1.31:1521:users --table FUND_INFO --username tpa_query --password tpa_query --split-by VC_FUNDCODE --hive-import --target-dir temp_table --hive-table fund_info --null-string '\N' --null-non-string '\N' --fields-terminated-by '01' --hive-drop-import-delims -m 4 --hive-overwrite --delete-target-dir
Warning: /opt/sqoop-1.4.7/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
18/05/23 16:16:47 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
18/05/23 16:16:47 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/05/23 16:16:47 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
18/05/23 16:16:47 INFO manager.SqlManager: Using default fetchSize of 1000
18/05/23 16:16:47 INFO tool.CodeGenTool: Beginning code generation
18/05/23 16:17:16 INFO manager.OracleManager: Time zone has been set to GMT
18/05/23 16:17:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM FUND_INFO t WHERE 1=0
18/05/23 16:17:16 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/hadoop-2.7.5
Note: /tmp/sqoop-root/compile/9398ba28d24e7b9db6b1260fa083e904/FUND_INFO.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/05/23 16:17:19 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/9398ba28d24e7b9db6b1260fa083e904/FUND_INFO.jar
18/05/23 16:17:20 INFO tool.ImportTool: Destination directory temp_table is not present, hence not deleting.
18/05/23 16:17:20 INFO mapreduce.ImportJobBase: Beginning import of FUND_INFO
18/05/23 16:17:20 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/05/23 16:17:20 INFO manager.OracleManager: Time zone has been set to GMT
18/05/23 16:17:20 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/05/23 16:17:20 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
18/05/23 16:17:22 INFO db.DBInputFormat: Using read commited transaction isolation
18/05/23 16:17:22 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(VC_FUNDCODE), MAX(VC_FUNDCODE) FROM FUND_INFO
18/05/23 16:17:22 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/root/.staging/job_1526097883376_0004
18/05/23 16:17:22 ERROR tool.ImportTool: Import failed: java.io.IOException: Generating splits for a textual index column allowed only in case of "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" property passed as a parameter
    at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:204)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
    at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:200)
    at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:173)
    at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:270)
    at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
    at org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:454)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:520)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: Generating splits for a textual index column allowed only in case of "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" property passed as a parameter
    at org.apache.sqoop.mapreduce.db.TextSplitter.split(TextSplitter.java:67)
    at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:201)
    ... 23 more

[root@node1 sqoop-1.4.7]# 

因为指定了-m 4,也就是4个map处理,-m > 1 必须有主键。因为这个表没有主键,所以需要需要设置-m 1

2、修改参数,再次执行

代码语言:javascript复制
[root@node1 sqoop-1.4.7]# bin/sqoop import --connect jdbc:oracle:thin:@192.168.1.31:1521:users --table FUND_INFO --username tpa_query --password tpa_query --hive-import --target-dir temp_table --hive-table fund_info --null-string '\N' --null-non-string '\N' --fields-terminated-by '01' --hive-drop-import-delims -m 1 --hive-overwrite --delete-target-dir
Warning: /opt/sqoop-1.4.7/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
18/05/23 22:21:42 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
18/05/23 22:21:42 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/05/23 22:21:42 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
18/05/23 22:21:42 INFO manager.SqlManager: Using default fetchSize of 1000
18/05/23 22:21:42 INFO tool.CodeGenTool: Beginning code generation
18/05/23 22:21:55 INFO manager.OracleManager: Time zone has been set to GMT
18/05/23 22:21:55 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM FUND_INFO t WHERE 1=0
18/05/23 22:21:55 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/hadoop-2.7.5
Note: /tmp/sqoop-root/compile/5c5a70aeddefb84000a6c46dab882820/FUND_INFO.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/05/23 22:21:57 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/5c5a70aeddefb84000a6c46dab882820/FUND_INFO.jar
18/05/23 22:21:58 INFO tool.ImportTool: Destination directory temp_table is not present, hence not deleting.
18/05/23 22:21:58 INFO manager.OracleManager: Time zone has been set to GMT
18/05/23 22:21:59 WARN manager.OracleManager: The table FUND_INFO contains a multi-column primary key. Sqoop will default to the column VC_SOURCE only for this job.
18/05/23 22:21:59 INFO manager.OracleManager: Time zone has been set to GMT
18/05/23 22:22:00 WARN manager.OracleManager: The table FUND_INFO contains a multi-column primary key. Sqoop will default to the column VC_SOURCE only for this job.
18/05/23 22:22:00 INFO mapreduce.ImportJobBase: Beginning import of FUND_INFO
18/05/23 22:22:00 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/05/23 22:22:00 INFO manager.OracleManager: Time zone has been set to GMT
18/05/23 22:22:00 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/05/23 22:22:00 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
18/05/23 22:22:03 INFO db.DBInputFormat: Using read commited transaction isolation
18/05/23 22:22:03 INFO mapreduce.JobSubmitter: number of splits:1
18/05/23 22:22:03 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1526097883376_0005
18/05/23 22:22:03 INFO impl.YarnClientImpl: Submitted application application_1526097883376_0005
18/05/23 22:22:04 INFO mapreduce.Job: The url to track the job: http://ndoe1:8088/proxy/application_1526097883376_0005/
18/05/23 22:22:04 INFO mapreduce.Job: Running job: job_1526097883376_0005
18/05/23 22:22:12 INFO mapreduce.Job: Job job_1526097883376_0005 running in uber mode : false
18/05/23 22:22:12 INFO mapreduce.Job:  map 0% reduce 0%
18/05/23 22:22:31 INFO mapreduce.Job:  map 100% reduce 0%
18/05/23 22:22:31 INFO mapreduce.Job: Job job_1526097883376_0005 completed successfully
18/05/23 22:22:31 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=142841
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=483891
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=16641
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=16641
        Total vcore-milliseconds taken by all map tasks=16641
        Total megabyte-milliseconds taken by all map tasks=17040384
    Map-Reduce Framework
        Map input records=1123
        Map output records=1123
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=68
        CPU time spent (ms)=2710
        Physical memory (bytes) snapshot=232636416
        Virtual memory (bytes) snapshot=2141016064
        Total committed heap usage (bytes)=109051904
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=483891
18/05/23 22:22:31 INFO mapreduce.ImportJobBase: Transferred 472.5498 KB in 30.9626 seconds (15.2619 KB/sec)
18/05/23 22:22:31 INFO mapreduce.ImportJobBase: Retrieved 1123 records.
18/05/23 22:22:31 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table FUND_INFO
18/05/23 22:22:31 INFO manager.OracleManager: Time zone has been set to GMT
18/05/23 22:22:31 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM FUND_INFO t WHERE 1=0
18/05/23 22:22:31 WARN hive.TableDefWriter: Column CREATE_DATE had to be cast to a less precise type in Hive
18/05/23 22:22:31 WARN hive.TableDefWriter: Column EDATE had to be cast to a less precise type in Hive
18/05/23 22:22:31 WARN hive.TableDefWriter: Column BEGIN_DATE had to be cast to a less precise type in Hive
18/05/23 22:22:31 WARN hive.TableDefWriter: Column END_DATE had to be cast to a less precise type in Hive
18/05/23 22:22:31 WARN hive.TableDefWriter: Column SET_AMOUNT had to be cast to a less precise type in Hive
18/05/23 22:22:31 WARN hive.TableDefWriter: Column GLF_RATIO had to be cast to a less precise type in Hive
18/05/23 22:22:31 WARN hive.TableDefWriter: Column CLEAR had to be cast to a less precise type in Hive
18/05/23 22:22:31 INFO hive.HiveImport: Loading uploaded data into Hive
18/05/23 22:22:31 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.
18/05/23 22:22:31 ERROR tool.ImportTool: Import failed: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
    at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:50)
    at org.apache.sqoop.hive.HiveImport.getHiveArgs(HiveImport.java:392)
    at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:379)
    at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:337)
    at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:264)
    at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:44)
    ... 12 more

[root@node1 sqoop-1.4.7]#

这个问题很蛋疼,百度了一下,大多是说配置环境变量之类,但是依然没有解决问题。

3、缺少Hive的Jar包

代码语言:javascript复制
[root@node1 ~]# cd /opt/sqoop-1.4.7/lib/
[root@node1 lib]# ln -s /opt/hive-1.2.2/lib/hive-exec-1.2.2.jar hive-exec-1.2.2.jar

成功执行

代码语言:javascript复制
[root@node1 sqoop-1.4.7]# bin/sqoop import --connect jdbc:oracle:thin:@192.168.1.31:1521:users --table FUND_INFO --username tpa_query --password tpa_query --hive-import --target-dir temp_table --hive-table fund_info --null-string '\N' --null-non-string '\N' --fields-terminated-by '01' --hive-drop-import-delims -m 1 --hive-overwrite --delete-target-dir
Warning: /opt/sqoop-1.4.7/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
18/05/23 22:43:25 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
18/05/23 22:43:25 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/05/23 22:43:25 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
18/05/23 22:43:25 INFO manager.SqlManager: Using default fetchSize of 1000
18/05/23 22:43:25 INFO tool.CodeGenTool: Beginning code generation
18/05/23 22:43:39 INFO manager.OracleManager: Time zone has been set to GMT
18/05/23 22:43:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM FUND_INFO t WHERE 1=0
18/05/23 22:43:39 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/hadoop-2.7.5
Note: /tmp/sqoop-root/compile/794664c2fe6c8abd0aa7d8b24b3f0b62/FUND_INFO.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/05/23 22:43:42 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/794664c2fe6c8abd0aa7d8b24b3f0b62/FUND_INFO.jar
18/05/23 22:43:43 INFO tool.ImportTool: Destination directory temp_table deleted.
18/05/23 22:43:43 INFO manager.OracleManager: Time zone has been set to GMT
18/05/23 22:43:43 WARN manager.OracleManager: The table FUND_INFO contains a multi-column primary key. Sqoop will default to the column VC_SOURCE only for this job.
18/05/23 22:43:43 INFO manager.OracleManager: Time zone has been set to GMT
18/05/23 22:43:43 WARN manager.OracleManager: The table FUND_INFO contains a multi-column primary key. Sqoop will default to the column VC_SOURCE only for this job.
18/05/23 22:43:43 INFO mapreduce.ImportJobBase: Beginning import of FUND_INFO
18/05/23 22:43:43 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/05/23 22:43:43 INFO manager.OracleManager: Time zone has been set to GMT
18/05/23 22:43:43 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/05/23 22:43:43 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
18/05/23 22:43:45 INFO db.DBInputFormat: Using read commited transaction isolation
18/05/23 22:43:45 INFO mapreduce.JobSubmitter: number of splits:1
18/05/23 22:43:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1526097883376_0009
18/05/23 22:43:46 INFO impl.YarnClientImpl: Submitted application application_1526097883376_0009
18/05/23 22:43:46 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1526097883376_0009/
18/05/23 22:43:46 INFO mapreduce.Job: Running job: job_1526097883376_0009
18/05/23 22:43:53 INFO mapreduce.Job: Job job_1526097883376_0009 running in uber mode : false
18/05/23 22:43:53 INFO mapreduce.Job:  map 0% reduce 0%
18/05/23 22:44:03 INFO mapreduce.Job:  map 100% reduce 0%
18/05/23 22:44:04 INFO mapreduce.Job: Job job_1526097883376_0009 completed successfully
18/05/23 22:44:04 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=143252
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=483891
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=8250
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=8250
        Total vcore-milliseconds taken by all map tasks=8250
        Total megabyte-milliseconds taken by all map tasks=8448000
    Map-Reduce Framework
        Map input records=1123
        Map output records=1123
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=131
        CPU time spent (ms)=2980
        Physical memory (bytes) snapshot=308367360
        Virtual memory (bytes) snapshot=2146000896
        Total committed heap usage (bytes)=154140672
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=483891
18/05/23 22:44:04 INFO mapreduce.ImportJobBase: Transferred 472.5498 KB in 21.5561 seconds (21.9219 KB/sec)
18/05/23 22:44:04 INFO mapreduce.ImportJobBase: Retrieved 1123 records.
18/05/23 22:44:04 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table FUND_INFO
18/05/23 22:44:04 INFO manager.OracleManager: Time zone has been set to GMT
18/05/23 22:44:04 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM FUND_INFO t WHERE 1=0
18/05/23 22:44:04 WARN hive.TableDefWriter: Column CREATE_DATE had to be cast to a less precise type in Hive
18/05/23 22:44:04 WARN hive.TableDefWriter: Column EDATE had to be cast to a less precise type in Hive
18/05/23 22:44:04 WARN hive.TableDefWriter: Column BEGIN_DATE had to be cast to a less precise type in Hive
18/05/23 22:44:04 WARN hive.TableDefWriter: Column END_DATE had to be cast to a less precise type in Hive
18/05/23 22:44:04 WARN hive.TableDefWriter: Column SET_AMOUNT had to be cast to a less precise type in Hive
18/05/23 22:44:04 WARN hive.TableDefWriter: Column GLF_RATIO had to be cast to a less precise type in Hive
18/05/23 22:44:04 WARN hive.TableDefWriter: Column CLEAR had to be cast to a less precise type in Hive
18/05/23 22:44:04 INFO hive.HiveImport: Loading uploaded data into Hive
18/05/23 22:44:07 INFO hive.HiveImport: 
18/05/23 22:44:07 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/opt/hive-1.2.2/lib/hive-exec-1.2.2.jar!/hive-log4j.properties
18/05/23 22:44:11 INFO hive.HiveImport: OK
18/05/23 22:44:11 INFO hive.HiveImport: Time taken: 3.071 seconds
18/05/23 22:44:12 INFO hive.HiveImport: Loading data to table default.fund_info
18/05/23 22:44:12 INFO hive.HiveImport: Table default.fund_info stats: [numFiles=1, numRows=0, totalSize=483891, rawDataSize=0]
18/05/23 22:44:12 INFO hive.HiveImport: OK
18/05/23 22:44:12 INFO hive.HiveImport: Time taken: 0.913 seconds
18/05/23 22:44:13 INFO hive.HiveImport: Hive import complete.
18/05/23 22:44:13 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.
[root@node1 sqoop-1.4.7]#

4、查看Hive结果

代码语言:javascript复制
[root@node1 opt]# hive

Logging initialized using configuration in file:/opt/hive-1.2.2/conf/hive-log4j.properties
hive> show tables;
OK
fund_info
t1
Time taken: 1.411 seconds, Fetched: 2 row(s)
hive> select count(*) from fund_info;
Query ID = root_20180523224519_4aa04e7c-abea-465e-a0a5-9a1fe0fe4d3d
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1526097883376_0010, Tracking URL = http://bigdata03-test:8088/proxy/application_1526097883376_0010/
Kill Command = /opt/hadoop-2.7.5/bin/hadoop job  -kill job_1526097883376_0010
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-05-23 22:45:29,355 Stage-1 map = 0%,  reduce = 0%
2018-05-23 22:45:34,654 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.02 sec
2018-05-23 22:45:41,967 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.89 sec
MapReduce Total cumulative CPU time: 4 seconds 890 msec
Ended Job = job_1526097883376_0010
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 4.89 sec   HDFS Read: 499445 HDFS Write: 5 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 890 msec
OK
1123
Time taken: 23.424 seconds, Fetched: 1 row(s)
hive>

5、测试导入一张大表

代码语言:javascript复制
[root@bigdata01-test sqoop-1.4.7]# bin/sqoop import --connect jdbc:oracle:thin:@10.17.12.31:1521:TPADC --table DETAIL --username tpa_query --password tpa_query --hive-import --target-dir temp_table --hive-table fact_invest_detail --null-string '\N' --null-non-string '\N' --fields-terminated-by '01' --hive-drop-import-delims -m 1 --hive-overwrite --delete-target-dir
Warning: /opt/sqoop-1.4.7/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
18/05/24 11:12:11 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
18/05/24 11:12:12 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/05/24 11:12:12 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
18/05/24 11:12:12 INFO manager.SqlManager: Using default fetchSize of 1000
18/05/24 11:12:12 INFO tool.CodeGenTool: Beginning code generation
18/05/24 11:12:45 INFO manager.OracleManager: Time zone has been set to GMT
18/05/24 11:12:46 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM DETAIL t WHERE 1=0
18/05/24 11:12:46 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/hadoop-2.7.5
Note: /tmp/sqoop-root/compile/b7308f9eec89b55d6b8e63ccd9ffbcca/TPA_DW_FACT_INVEST_DETAIL.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/05/24 11:12:49 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/b7308f9eec89b55d6b8e63ccd9ffbcca/DETAIL.jar
18/05/24 11:12:50 INFO tool.ImportTool: Destination directory temp_table is not present, hence not deleting.
18/05/24 11:12:50 INFO manager.OracleManager: Time zone has been set to GMT
18/05/24 11:12:51 INFO manager.OracleManager: Time zone has been set to GMT
18/05/24 11:12:52 INFO mapreduce.ImportJobBase: Beginning import of DETAIL
18/05/24 11:12:52 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/05/24 11:12:52 INFO manager.OracleManager: Time zone has been set to GMT
18/05/24 11:12:52 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/05/24 11:12:52 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
18/05/24 11:12:54 INFO db.DBInputFormat: Using read commited transaction isolation
18/05/24 11:12:54 INFO mapreduce.JobSubmitter: number of splits:1
18/05/24 11:12:55 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1526097883376_0011
18/05/24 11:12:55 INFO impl.YarnClientImpl: Submitted application application_1526097883376_0011
18/05/24 11:12:55 INFO mapreduce.Job: The url to track the job: http://bigdata03-test:8088/proxy/application_1526097883376_0011/
18/05/24 11:12:55 INFO mapreduce.Job: Running job: job_1526097883376_0011
18/05/24 11:13:02 INFO mapreduce.Job: Job job_1526097883376_0011 running in uber mode : false
18/05/24 11:13:02 INFO mapreduce.Job:  map 0% reduce 0%
18/05/24 11:18:34 INFO mapreduce.Job:  map 100% reduce 0%
18/05/24 11:18:35 INFO mapreduce.Job: Job job_1526097883376_0011 completed successfully
18/05/24 11:18:35 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=144210
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=2981482292
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=328641
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=328641
        Total vcore-milliseconds taken by all map tasks=328641
        Total megabyte-milliseconds taken by all map tasks=336528384
    Map-Reduce Framework
        Map input records=6351441
        Map output records=6351441
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=3074
        CPU time spent (ms)=320810
        Physical memory (bytes) snapshot=246095872
        Virtual memory (bytes) snapshot=2141949952
        Total committed heap usage (bytes)=114294784
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=2981482292
18/05/24 11:18:35 INFO mapreduce.ImportJobBase: Transferred 2.7767 GB in 342.9783 seconds (8.2902 MB/sec)
18/05/24 11:18:35 INFO mapreduce.ImportJobBase: Retrieved 6351441 records.
18/05/24 11:18:35 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table DETAIL
18/05/24 11:18:35 INFO manager.OracleManager: Time zone has been set to GMT
18/05/24 11:18:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM DETAIL t WHERE 1=0
18/05/24 11:18:35 WARN hive.TableDefWriter: Column BMONTH_VALPRICE had to be cast to a less precise type in Hive
18/05/24 11:18:35 WARN hive.TableDefWriter: Column BQUART_VALPRICE had to be cast to a less precise type in Hive
18/05/24 11:18:35 WARN hive.TableDefWriter: Column BYEAR_VALPRICE had to be cast to a less precise type in Hive
18/05/24 11:18:35 WARN hive.TableDefWriter: Column CAPITUP_BY had to be cast to a less precise type in Hive
18/05/24 11:18:35 WARN hive.TableDefWriter: Column CAPITUP_DIVIDENTS had to be cast to a less precise type in Hive
18/05/24 11:18:35 WARN hive.TableDefWriter: Column CAPITUP_SUM had to be cast to a less precise type in Hive
18/05/24 11:18:35 WARN hive.TableDefWriter: Column CAPITUP_DIVIDENTS had to be cast to a less precise type in Hive
18/05/24 11:18:35 WARN hive.TableDefWriter: Column AMOUNT_GZ had to be cast to a less precise type in Hive
18/05/24 11:18:35 WARN hive.TableDefWriter: Column FAIRCHANGE had to be cast to a less precise type in Hive
18/05/24 11:18:35 INFO hive.HiveImport: Loading uploaded data into Hive
18/05/24 11:18:45 INFO hive.HiveImport: 
18/05/24 11:18:45 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/opt/hive-1.2.2/lib/hive-exec-1.2.2.jar!/hive-log4j.properties
18/05/24 11:18:49 INFO hive.HiveImport: OK
18/05/24 11:18:49 INFO hive.HiveImport: Time taken: 1.94 seconds
18/05/24 11:18:49 INFO hive.HiveImport: Loading data to table default.fact_invest_detail
18/05/24 11:18:50 INFO hive.HiveImport: Table default.fact_invest_detail stats: [numFiles=1, numRows=0, totalSize=2981482292, rawDataSize=0]
18/05/24 11:18:50 INFO hive.HiveImport: OK
18/05/24 11:18:50 INFO hive.HiveImport: Time taken: 0.903 seconds
18/05/24 11:18:50 INFO hive.HiveImport: Hive import complete.
18/05/24 11:18:50 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.
[root@bigdata01-test sqoop-1.4.7]# 

花费六七分钟

代码语言:javascript复制
hive> show tables;
OK
detail
fund_info
t1
Time taken: 1.406 seconds, Fetched: 3 row(s)
hive> select count(*) from detail;
Query ID = root_20180524112136_44d65b6a-14d4-4f36-aa69-46081ecdc64a
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1526097883376_0012, Tracking URL = http://ndoe1:8088/proxy/application_1526097883376_0012/
Kill Command = /opt/hadoop-2.7.5/bin/hadoop job  -kill job_1526097883376_0012
Hadoop job information for Stage-1: number of mappers: 12; number of reducers: 1
2018-05-24 11:21:50,959 Stage-1 map = 0%,  reduce = 0%
2018-05-24 11:21:58,895 Stage-1 map = 17%,  reduce = 0%, Cumulative CPU 4.24 sec
2018-05-24 11:21:59,959 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 16.26 sec
2018-05-24 11:22:02,132 Stage-1 map = 42%,  reduce = 0%, Cumulative CPU 19.99 sec
2018-05-24 11:22:03,178 Stage-1 map = 53%,  reduce = 0%, Cumulative CPU 28.85 sec
2018-05-24 11:22:04,218 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 44.23 sec
2018-05-24 11:22:05,257 Stage-1 map = 78%,  reduce = 0%, Cumulative CPU 51.24 sec
2018-05-24 11:22:06,298 Stage-1 map = 89%,  reduce = 0%, Cumulative CPU 52.63 sec
2018-05-24 11:22:07,336 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 53.68 sec
2018-05-24 11:22:09,415 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 56.66 sec
MapReduce Total cumulative CPU time: 56 seconds 660 msec
Ended Job = job_1526097883376_0012
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 12  Reduce: 1   Cumulative CPU: 56.66 sec   HDFS Read: 2983164717 HDFS Write: 8 SUCCESS
Total MapReduce CPU Time Spent: 56 seconds 660 msec
OK
6351441
Time taken: 34.987 seconds, Fetched: 1 row(s)
hive>

0 人点赞