Sqoop使用和简介

2022-06-29 21:21:14 浏览数 (1)

Sqoop 工具是Hadoop环境下连接关系数据库,和hadoop存储系统的桥梁,支持多种关系数据源和hive,hdfs,hbase的相互导入。一般情况下,关系数据表存在于线上环境的备份环境,需要每天进行数据导入,根据每天的数据量而言,sqoop可以全表导入,对于每天产生的数据量不是很大的情形可以全表导入,但是sqoop也提供了增量数据导入的机制。

下面介绍几个常用的sqoop的命令,以及一些参数:

序号

命令/command

说明

1

impor

ImportTool

从关系型数据库中导入数据(来自表或者查询语句)到HDFS中

2

export

ExportTool

将HDFS中的数据导入到关系型数据库中

3

codegen

CodeGenTool

获取数据库中某张表数据生成Java并打成jar包

4

create-hive-table

CreateHiveTableTool

创建Hive表

5

eval

EvalSqlTool

查看SQL执行结果

6

import-all-tables

ImportAllTablesTool

导入某个数据库下所有表到HDFS中

7

job

JobTool

8

list-databases

ListDatabasesTool

列出所有数据库名

9

list-tables

ListTablesTool

列出某个数据库下所有表

10

merge

MergeTool

11

metastore

MetastoreTool

12

help

HelpTool

查看帮助

13

version

VersionTool

查看版本

接着列出Sqoop的各种通用参数,然后针对以上13个命令列出他们自己的参数.Sqoop通用参数又分Common arguments

Incrementalimport arguments

Outputline formatting arguments

Inputparsing arguments,Hive arguments

HBasearguments

GenericHadoop command-line arguments

1.Common arguments通用参数,主要是针对关系型数据库链接的一些参数

序号

参数

说明

样例

1

connect

连接关系型数据库的URL

jdbc:mysql://localhost/sqoop_datas

2

connection-manager

连接管理类,一般不用

3

driver

连接驱动

4

hadoop-home

hadoop目录

/home/hadoop

5

help

查看帮助信息

6

password

连接关系型数据库的密码

7

username

链接关系型数据库的用户名

8

verbose

查看更多的信息,其实是将日志级别调低

该参数后面不接值

Importcontrol arguments:

Argument

Description

--append

Append data to an existing dataset in HDFS

--as-avrodatafile

Imports data to Avro Data Files

--as-sequencefile

Imports data to SequenceFiles

--as-textfile

Imports data as plain text (default)

--boundary-query <statement>

Boundary query to use for creating splits

--columns <col,col,col…>

Columns to import from table

--direct

Use direct import fast path

--direct-split-size <n>

Split the input stream every n bytes when importing in direct mode

--inline-lob-limit <n>

Set the maximum size for an inline LOB

-m,--num-mappers <n>

Use n map tasks to import in parallel

-e,--query <statement>

Import the results of statement.

--split-by <column-name>

Column of the table used to split work units

--table <table-name>

Table to read

--target-dir <dir>

HDFS destination dir

--warehouse-dir <dir>

HDFS parent for table destination

--where <where clause>

WHERE clause to use during import

-z,--compress

Enable compression

--compression-codec <c>

Use Hadoop codec (default gzip)

--null-string <null-string>

The string to be written for a null value for string columns

--null-non-string <null-string>

The string to be written for a null value for non-string columns

Incrementalimport arguments:

Argument

Description

--check-column (col)

Specifies the column to be examined when determining which rows to import.

--incremental (mode)

Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified.

--last-value (value)

Specifies the maximum value of the check column from the previous import.

Output lineformatting arguments:

Argument

Description

--enclosed-by <char>

Sets a required field enclosing character

--escaped-by <char>

Sets the escape character

--fields-terminated-by <char>

Sets the field separator character

--lines-terminated-by <char>

Sets the end-of-line character

--mysql-delimiters

Uses MySQL’s default delimiter set: fields: , lines: n escaped-by: optionally-enclosed-by: '

--optionally-enclosed-by <char>

Sets a field enclosing character

Hivearguments:

Argument

Description

--hive-home <dir>

Override $HIVE_HOME

--hive-import

Import tables into Hive (Uses Hive’s default delimiters if none are set.)

--hive-overwrite

Overwrite existing data in the Hive table.

--create-hive-table

If set, then the job will fail if the target hive

table exits. By default this property is false.

--hive-table <table-name>

Sets the table name to use when importing to Hive.

--hive-drop-import-delims

Drops n, r, and 1 from string fields when importing to Hive.

--hive-delims-replacement

Replace n, r, and 1 from string fields with user defined string when importing to Hive.

--hive-partition-key

Name of a hive field to partition are sharded on

--hive-partition-value <v>

String-value that serves as partition key for this imported into hive in this job.

--map-column-hive <map>

Override default mapping from SQL type to Hive type for configured columns.

HBasearguments:

Argument

Description

--column-family <family>

Sets the target column family for the import

--hbase-create-table

If specified, create missing HBase tables

--hbase-row-key <col>

Specifies which input column to use as the row key

--hbase-table <table-name>

Specifies an HBase table to use as the target instead of HDFS

Codegeneration arguments:

Argument

Description

--bindir <dir>

Output directory for compiled objects

--class-name <name>

Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class.

--jar-file <file>

Disable code generation; use specified jar

--outdir <dir>

Output directory for generated code

--package-name <name>

Put auto-generated classes in this package

--map-column-java <m>

Override default mapping from SQL type to Java type for configured columns.

0 人点赞