Sqoop 工具是Hadoop环境下连接关系数据库,和hadoop存储系统的桥梁,支持多种关系数据源和hive,hdfs,hbase的相互导入。一般情况下,关系数据表存在于线上环境的备份环境,需要每天进行数据导入,根据每天的数据量而言,sqoop可以全表导入,对于每天产生的数据量不是很大的情形可以全表导入,但是sqoop也提供了增量数据导入的机制。
下面介绍几个常用的sqoop的命令,以及一些参数:
序号 | 命令/command | 类 | 说明 |
---|---|---|---|
1 | impor | ImportTool | 从关系型数据库中导入数据(来自表或者查询语句)到HDFS中 |
2 | export | ExportTool | 将HDFS中的数据导入到关系型数据库中 |
3 | codegen | CodeGenTool | 获取数据库中某张表数据生成Java并打成jar包 |
4 | create-hive-table | CreateHiveTableTool | 创建Hive表 |
5 | eval | EvalSqlTool | 查看SQL执行结果 |
6 | import-all-tables | ImportAllTablesTool | 导入某个数据库下所有表到HDFS中 |
7 | job | JobTool | |
8 | list-databases | ListDatabasesTool | 列出所有数据库名 |
9 | list-tables | ListTablesTool | 列出某个数据库下所有表 |
10 | merge | MergeTool | |
11 | metastore | MetastoreTool | |
12 | help | HelpTool | 查看帮助 |
13 | version | VersionTool | 查看版本 |
接着列出Sqoop的各种通用参数,然后针对以上13个命令列出他们自己的参数.Sqoop通用参数又分Common arguments
Incrementalimport arguments
Outputline formatting arguments
Inputparsing arguments,Hive arguments
HBasearguments
GenericHadoop command-line arguments
1.Common arguments通用参数,主要是针对关系型数据库链接的一些参数
序号 | 参数 | 说明 | 样例 |
---|---|---|---|
1 | connect | 连接关系型数据库的URL | jdbc:mysql://localhost/sqoop_datas |
2 | connection-manager | 连接管理类,一般不用 | |
3 | driver | 连接驱动 | |
4 | hadoop-home | hadoop目录 | /home/hadoop |
5 | help | 查看帮助信息 | |
6 | password | 连接关系型数据库的密码 | |
7 | username | 链接关系型数据库的用户名 | |
8 | verbose | 查看更多的信息,其实是将日志级别调低 | 该参数后面不接值 |
Importcontrol arguments:
Argument | Description |
---|---|
--append | Append data to an existing dataset in HDFS |
--as-avrodatafile | Imports data to Avro Data Files |
--as-sequencefile | Imports data to SequenceFiles |
--as-textfile | Imports data as plain text (default) |
--boundary-query <statement> | Boundary query to use for creating splits |
--columns <col,col,col…> | Columns to import from table |
--direct | Use direct import fast path |
--direct-split-size <n> | Split the input stream every n bytes when importing in direct mode |
--inline-lob-limit <n> | Set the maximum size for an inline LOB |
-m,--num-mappers <n> | Use n map tasks to import in parallel |
-e,--query <statement> | Import the results of statement. |
--split-by <column-name> | Column of the table used to split work units |
--table <table-name> | Table to read |
--target-dir <dir> | HDFS destination dir |
--warehouse-dir <dir> | HDFS parent for table destination |
--where <where clause> | WHERE clause to use during import |
-z,--compress | Enable compression |
--compression-codec <c> | Use Hadoop codec (default gzip) |
--null-string <null-string> | The string to be written for a null value for string columns |
--null-non-string <null-string> | The string to be written for a null value for non-string columns |
Incrementalimport arguments:
Argument | Description |
---|---|
--check-column (col) | Specifies the column to be examined when determining which rows to import. |
--incremental (mode) | Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified. |
--last-value (value) | Specifies the maximum value of the check column from the previous import. |
Output lineformatting arguments:
Argument | Description |
---|---|
--enclosed-by <char> | Sets a required field enclosing character |
--escaped-by <char> | Sets the escape character |
--fields-terminated-by <char> | Sets the field separator character |
--lines-terminated-by <char> | Sets the end-of-line character |
--mysql-delimiters | Uses MySQL’s default delimiter set: fields: , lines: n escaped-by: optionally-enclosed-by: ' |
--optionally-enclosed-by <char> | Sets a field enclosing character |
Hivearguments:
Argument | Description |
---|---|
--hive-home <dir> | Override $HIVE_HOME |
--hive-import | Import tables into Hive (Uses Hive’s default delimiters if none are set.) |
--hive-overwrite | Overwrite existing data in the Hive table. |
--create-hive-table | If set, then the job will fail if the target hive |
table exits. By default this property is false. | |
--hive-table <table-name> | Sets the table name to use when importing to Hive. |
--hive-drop-import-delims | Drops n, r, and 1 from string fields when importing to Hive. |
--hive-delims-replacement | Replace n, r, and 1 from string fields with user defined string when importing to Hive. |
--hive-partition-key | Name of a hive field to partition are sharded on |
--hive-partition-value <v> | String-value that serves as partition key for this imported into hive in this job. |
--map-column-hive <map> | Override default mapping from SQL type to Hive type for configured columns. |
HBasearguments:
Argument | Description |
---|---|
--column-family <family> | Sets the target column family for the import |
--hbase-create-table | If specified, create missing HBase tables |
--hbase-row-key <col> | Specifies which input column to use as the row key |
--hbase-table <table-name> | Specifies an HBase table to use as the target instead of HDFS |
Codegeneration arguments:
Argument | Description |
---|---|
--bindir <dir> | Output directory for compiled objects |
--class-name <name> | Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class. |
--jar-file <file> | Disable code generation; use specified jar |
--outdir <dir> | Output directory for generated code |
--package-name <name> | Put auto-generated classes in this package |
--map-column-java <m> | Override default mapping from SQL type to Java type for configured columns. |