Sqoop使用和简介

Sqoop 工具是Hadoop环境下连接关系数据库，和hadoop存储系统的桥梁，支持多种关系数据源和hive,hdfs,hbase的相互导入。一般情况下，关系数据表存在于线上环境的备份环境，需要每天进行数据导入，根据每天的数据量而言，sqoop可以全表导入，对于每天产生的数据量不是很大的情形可以全表导入，但是sqoop也提供了增量数据导入的机制。

下面介绍几个常用的sqoop的命令，以及一些参数：

序号	命令/command	类	说明
1	impor	ImportTool	从关系型数据库中导入数据(来自表或者查询语句)到HDFS中
2	export	ExportTool	将HDFS中的数据导入到关系型数据库中
3	codegen	CodeGenTool	获取数据库中某张表数据生成Java并打成jar包
4	create-hive-table	CreateHiveTableTool	创建Hive表
5	eval	EvalSqlTool	查看SQL执行结果
6	import-all-tables	ImportAllTablesTool	导入某个数据库下所有表到HDFS中
7	job	JobTool
8	list-databases	ListDatabasesTool	列出所有数据库名
9	list-tables	ListTablesTool	列出某个数据库下所有表
10	merge	MergeTool
11	metastore	MetastoreTool
12	help	HelpTool	查看帮助
13	version	VersionTool	查看版本

接着列出Sqoop的各种通用参数,然后针对以上13个命令列出他们自己的参数.Sqoop通用参数又分Common arguments

Incrementalimport arguments

Outputline formatting arguments

Inputparsing arguments,Hive arguments

HBasearguments

GenericHadoop command-line arguments

1.Common arguments通用参数,主要是针对关系型数据库链接的一些参数

序号	参数	说明	样例
1	connect	连接关系型数据库的URL	jdbc:mysql://localhost/sqoop_datas
2	connection-manager	连接管理类,一般不用
3	driver	连接驱动
4	hadoop-home	hadoop目录	/home/hadoop
5	help	查看帮助信息
6	password	连接关系型数据库的密码
7	username	链接关系型数据库的用户名
8	verbose	查看更多的信息,其实是将日志级别调低	该参数后面不接值

Importcontrol arguments:

Argument	Description
--append	Append data to an existing dataset in HDFS
--as-avrodatafile	Imports data to Avro Data Files
--as-sequencefile	Imports data to SequenceFiles
--as-textfile	Imports data as plain text (default)
--boundary-query <statement>	Boundary query to use for creating splits
--columns <col,col,col…>	Columns to import from table
--direct	Use direct import fast path
--direct-split-size <n>	Split the input stream every n bytes when importing in direct mode
--inline-lob-limit <n>	Set the maximum size for an inline LOB
-m,--num-mappers <n>	Use n map tasks to import in parallel
-e,--query <statement>	Import the results of statement.
--split-by <column-name>	Column of the table used to split work units
--table <table-name>	Table to read
--target-dir <dir>	HDFS destination dir
--warehouse-dir <dir>	HDFS parent for table destination
--where <where clause>	WHERE clause to use during import
-z,--compress	Enable compression
--compression-codec <c>	Use Hadoop codec (default gzip)
--null-string <null-string>	The string to be written for a null value for string columns
--null-non-string <null-string>	The string to be written for a null value for non-string columns

Incrementalimport arguments:

Argument	Description
--check-column (col)	Specifies the column to be examined when determining which rows to import.
--incremental (mode)	Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified.
--last-value (value)	Specifies the maximum value of the check column from the previous import.

Output lineformatting arguments:

Argument	Description
--enclosed-by <char>	Sets a required field enclosing character
--escaped-by <char>	Sets the escape character
--fields-terminated-by <char>	Sets the field separator character
--lines-terminated-by <char>	Sets the end-of-line character
--mysql-delimiters	Uses MySQL’s default delimiter set: fields: , lines: n escaped-by: optionally-enclosed-by: '
--optionally-enclosed-by <char>	Sets a field enclosing character

Hivearguments:

Argument	Description
--hive-home <dir>	Override $HIVE_HOME
--hive-import	Import tables into Hive (Uses Hive’s default delimiters if none are set.)
--hive-overwrite	Overwrite existing data in the Hive table.
--create-hive-table	If set, then the job will fail if the target hive
	table exits. By default this property is false.
--hive-table <table-name>	Sets the table name to use when importing to Hive.
--hive-drop-import-delims	Drops n, r, and 1 from string fields when importing to Hive.
--hive-delims-replacement	Replace n, r, and 1 from string fields with user defined string when importing to Hive.
--hive-partition-key	Name of a hive field to partition are sharded on
--hive-partition-value <v>	String-value that serves as partition key for this imported into hive in this job.
--map-column-hive <map>	Override default mapping from SQL type to Hive type for configured columns.

HBasearguments:

Argument	Description
--column-family <family>	Sets the target column family for the import
--hbase-create-table	If specified, create missing HBase tables
--hbase-row-key <col>	Specifies which input column to use as the row key
--hbase-table <table-name>	Specifies an HBase table to use as the target instead of HDFS

Codegeneration arguments:

Argument	Description
--bindir <dir>	Output directory for compiled objects
--class-name <name>	Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class.
--jar-file <file>	Disable code generation; use specified jar
--outdir <dir>	Output directory for generated code
--package-name <name>	Put auto-generated classes in this package
--map-column-java <m>	Override default mapping from SQL type to Java type for configured columns.

hive 大数据编程算法数据库 sql

0 人点赞