接下来我们一块儿看一下HBase的几个概念,首先来看第一个概念:Row Key,如下图所示,Row Key顾名思义,就是把一行当做主键,由于HBase建立了索引,所以我们根据行号可以迅速定位的那一行,我们还可以通过Row Key的range来定位数据,也就是查询的时候一次查多行的数据,指定一个范围,同样可以根据索引快速为我们查询出我们想要的结果。当然,也可以通过全表扫描的方式来查询我们想要的数据,这种方式相对来说就慢了。







xiaoye@ubuntu3:~/Downloads cd .. xiaoye@ubuntu3:~ mv hbase-1.0.0-cdh5.5.1/ hbase xiaoye@ubuntu3:~


要想跑起来HBase,我们需要简单配置一下两个文件,分别是hbase-env.sh和hbase-site.xml,首先我们来配置一下hbase-env.sh文件,如下所示,habase-env.sh文件当中的export JAVA_HOME这一行的内容原来配置的是jdk1.6版本的并且是注释掉的,我们现在去掉注释并 将jdk的版本换成我们现在用的版本。改完之后保存退出。

xiaoye@ubuntu3:~/hbase/conf vim hbase-env.sh xiaoye@ubuntu3:~/hbase/conf

# The java implementation to use. Java 1.7 required. export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64


xiaoye@ubuntu3:~/hbase/conf$ vim hbase-site.xml

<configuration> <property> <name>hbase.rootdir</name> <value>file:///home/xiaoye/hbase</value> </property>


2.1 下面进入bin目录下启动hbase

xiaoye@ubuntu3:~/hbase/conf vim hbase-site.xml xiaoye@ubuntu3:~/hbase/conf cd ../bin xiaoye@ubuntu3:~/hbase/bin

xiaoye@ubuntu3:~/hbase/bin$ ./start-hbase.sh

xiaoye@ubuntu3:~/hbase/bin ./start-hbase.sh starting master, logging to /home/xiaoye/hbase/bin/../logs/hbase-xiaoye-master-ubuntu3.out xiaoye@ubuntu3:~/hbase/bin jps 16483 Jps 1431 QuorumPeerMain 2279 ResourceManager 1503 JournalNode 2196 DataNode 2424 NodeManager


ERROR [main] master.HMasterCommandLine: Master exiting

java.io.IOException: Could not start ZK at requested port of 2181. ZK was started at port: -1. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.


xiaoye@ubuntu3:~/hbase/logs$ cd ../conf/

xiaoye@ubuntu3:~/hbase/conf$ vim hbase-site.xml

<configuration> <property> <name>hbase.rootdir</name> <value>file:///home/xiaoye/hbase</value> </property> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2182</value> </property>



xiaoye@ubuntu3:~/hbase/conf ../bin/start-hbase.sh starting master, logging to /home/xiaoye/hbase/bin/../logs/hbase-xiaoye-master-ubuntu3.out xiaoye@ubuntu3:~/hbase/conf jps 1431 QuorumPeerMain 2279 ResourceManager 17446 Jps 1503 JournalNode 2196 DataNode 2424 NodeManager

17107 HMaster



我们来执行一下./hbase shell

xiaoye@ubuntu3:~/hbase/bin$ ./hbase shell 2018-04-06 04:52:06,631 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/xiaoye/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/xiaoye/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2018-04-06 04:52:28,528 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable HBase Shell; enter ‘help<RETURN>’ for list of supported commands. Type “exit<RETURN>” to leave the HBase Shell Version 1.0.0-cdh5.5.1, rUnknown, Wed Dec 2 10:36:13 PST 2015

hbase(main):001:0> help HBase Shell, version 1.0.0-cdh5.5.1, rUnknown, Wed Dec 2 10:36:13 PST 2015 Type ‘help “COMMAND”‘, (e.g. ‘help “get”‘ — the quotes are necessary) for help on a specific command. Commands are grouped. Type ‘help “COMMAND_GROUP”‘, (e.g. ‘help “general”‘) for help on a command group.

COMMAND GROUPS: Group name: general Commands: status, table_help, version, whoami

Group name: ddl Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, show_filters

Group name: namespace Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables

Group name: dml Commands: append, count, delete, deleteall, get, get_counter, incr, put, scan, truncate, truncate_preserve

Group name: tools Commands: assign, balance_switch, balancer, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, close_region, compact, compact_mob, compact_rs, flush, major_compact, major_compact_mob, merge_region, move, split, trace, unassign, wal_roll, zk_dump

Group name: replication Commands: add_peer, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, list_peers, list_replicated_tables, remove_peer, remove_peer_tableCFs, set_peer_tableCFs, show_peer_tableCFs

Group name: snapshots Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, list_snapshots, restore_snapshot, snapshot

Group name: configuration Commands: update_all_config, update_config

Group name: quotas Commands: list_quotas, set_quota

Group name: security Commands: grant, revoke, user_permission

Group name: visibility labels Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility

SHELL USAGE: Quote all names in HBase Shell such as table and column names. Commas delimit command parameters. Type <RETURN> after entering a command to run it. Dictionaries of configuration used in the creation and alteration of tables are Ruby Hashes. They look like this:

{‘key1’ => ‘value1’, ‘key2’ => ‘value2’, …}

and are opened and closed with curley-braces. Key/values are delimited by the ‘=>’ character combination. Usually keys are predefined constants such as NAME, VERSIONS, COMPRESSION, etc. Constants do not need to be quoted. Type ‘Object.constants’ to see a (messy) list of all constants in the environment.

If you are using binary keys or values and need to enter them in the shell, use double-quote’d hexadecimal representation. For example:

hbase> get ‘t1’, “keyx03x3fxcd” hbase> get ‘t1’, “key032311” hbase> put ‘t1’, “testxefxff”, ‘f1:’, “x01x33x40”

The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added. For more on the HBase Shell, see http://hbase.apache.org/book.html





DDLData Definition Language数据库定义语言,用于定义数据库的三级结构,包括外模式、概念模式、内模式及其相互之间的映像,定义数据的完整性、安全控制等约束。DDL不需要commit。常用的命令有alter(修改表),create(创建表), describe(表结构的描述信息),drop(删除表),list(查询所有的表),可以发现都是针对表的操作。

DML(Data Manipulation Language)数据操纵语言,用于让用户或程序员使用,实现对数据库中数据的操作。DML分成交互型DML和嵌入型DML两类。依据语言的级别,DML又可分成过程性DML和非过程性DML两种。需要commit。常用的命令有scan(全表扫描,相当于select *),get(取出一条数据),put(向表中插入数据),delete(删除表中数据),等等。可以发现是对数据操作的命令。


hbase(main):002:0> heltpp

NameError: undefined local variable or method `heltpp’ for #<Object:0x1d408060>

这里我的help输入错了,发现backspace键不能回退删除,百度说hbase的shell命令的回退键是ctrl backspace

hbase(main):003:0> help ‘create’ Creates a table. Pass a table name, and a set of column family specifications (at least one), and, optionally, table configuration. Column specification can be a simple string (name), or a dictionary (dictionaries are described below in main help output), necessarily including NAME attribute. Examples:

Create a table with namespace=ns1 and table qualifier=t1 hbase> create ‘ns1:t1’, {NAME => ‘f1’, VERSIONS => 5}

Create a table with namespace=default and table qualifier=t1 hbase> create ‘t1’, {NAME => ‘f1’}, {NAME => ‘f2’}, {NAME => ‘f3’} hbase> # The above in shorthand would be the following: hbase> create ‘t1’, ‘f1’, ‘f2’, ‘f3’ hbase> create ‘t1’, {NAME => ‘f1’, VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true} hbase> create ‘t1’, {NAME => ‘f1’, CONFIGURATION => {‘hbase.hstore.blockingStoreFiles’ => ’10’}} Table configuration options can be put at the end. Examples:

hbase> create ‘ns1:t1’, ‘f1′, SPLITS => [’10’, ’20’, ’30’, ’40’] hbase> create ‘t1’, ‘f1′, SPLITS => [’10’, ’20’, ’30’, ’40’] hbase> create ‘t1’, ‘f1’, SPLITS_FILE => ‘splits.txt’, OWNER => ‘johndoe’ hbase> create ‘t1’, {NAME => ‘f1’, VERSIONS => 5}, METADATA => { ‘mykey’ => ‘myvalue’ } hbase> # Optionally pre-split the table into NUMREGIONS, using hbase> # SPLITALGO (“HexStringSplit”, “UniformSplit” or classname) hbase> create ‘t1’, ‘f1’, {NUMREGIONS => 15, SPLITALGO => ‘HexStringSplit’} hbase> create ‘t1’, ‘f1’, {NUMREGIONS => 15, SPLITALGO => ‘HexStringSplit’, REGION_REPLICATION => 2, CONFIGURATION => {‘hbase.hregion.scan.loadColumnFamiliesOnDemand’ => ‘true’}}

You can also keep around a reference to the created table:

hbase> t1 = create ‘t1’, ‘f1’

Which gives you a reference to the table named ‘t1’, on which you can then call methods.



先介绍命令的意思:create不用多说,就是创建的意思,’student’是表名,{NAME => ‘info’, VERSIONS =>3}的意思是一个列族,建表的时候我们必须至少建一个列族,也可以建多个,NAME => ‘info’是给这个列族起的名字,VERSIONS =>3是指这个列族可以存储三个版本的数据,多于3个的话,最老的版本将被删除(这个后面会说到),同理,{NAME => ‘data’, VERSIONS =>1}这句的意思是建了另外一个列族,这个列族的名字是’data’,存储的版本只有1个。

hbase(main):006:0> create ‘student’,{NAME => ‘info’,VERSIONS =>3},{name => ‘data’,VERSIONS=>1} NameError: undefined local variable or method `name’ for #<Object:0x1d408060>

hbase(main):007:0> create ‘student’,{NAME => ‘info’,VERSIONS =>3},{NAME => ‘data’,VERSIONS=>1} 0 row(s) in 2.6020 seconds

=> Hbase::Table – student

可以看出hbase 的shell命令区分大小写,


来具体说一下这条语句的意思,put的意思是插入,’student’的意思是表名,表示我们是向student表中插入数据,’rk0001’的意思是row key,可以认为是一行的唯一标识符,’info:name’的意思是一个cell(单元格),一个单元格是由列族和列名共同组成的,iinfo是列族,name是列名,’tom’是name的值。其实我们还可以指定timestamp的值,我们这里没有指定,系统会自动帮我们生成一个timestamp。

hbase(main):015:0> put ‘student’,’rk0001′ ,’info:name’,’tom’ 0 row(s) in 0.3790 seconds

hbase(main):016:0> scan ‘student’ ROW COLUMN CELL rk0001 column=info:name, timestamp=1523017557754, value=tom 1 row(s) in 0.0590 seconds


hbase(main):017:0> put ‘student’, ‘rk0002′,’data:score’,’99’ 0 row(s) in 0.0320 seconds

hbase(main):018:0> scan ‘student’ ROW COLUMN CELL rk0001 column=info:name, timestamp=1523017557754, value=tom rk0002 column=data:score, timestamp=1523017667855, value=99 2 row(s) in 0.0430 seconds

增加属性: hbase(main):019:0> put ‘student’,’rk0001′ ,’info:age’,’22’ 0 row(s) in 0.0150 seconds

hbase(main):020:0> scan ‘student’ ROW COLUMN CELL rk0001 column=info:age, timestamp=1523017710918, value=22 rk0001 column=info:name, timestamp=1523017557754, value=tom rk0002 column=data:score, timestamp=1523017667855, value=99

2 row(s) in 0.0580 seconds


hbase(main):021:0> delete ‘student’,’rk0002′,’data:score’, 1523017667855 0 row(s) in 0.0960 seconds

hbase(main):022:0> scan ‘student’ ROW COLUMN CELL rk0001 column=info:age, timestamp=1523017710918, value=22 rk0001 column=info:name, timestamp=1523017557754, value=tom 1 row(s) in 0.0680 seconds


hbase(main):026:0> put ‘student’,’rk0002′,’info:name’,’jerry4′ 0 row(s) in 0.0350 seconds

hbase(main):027:0> scan ‘student’ ROW COLUMN CELL rk0001 column=info:age, timestamp=1523017710918, value=22 rk0001 column=info:name, timestamp=1523017557754, value=tom rk0002 column=info:name, timestamp=1523018366607, value=jerry4 2 row(s) in 0.0470 seconds

hbase(main):028:0> put ‘student’,’rk0002′,’info:gender’,’male’ 0 row(s) in 0.0150 seconds

hbase(main):029:0> scan ‘student’ ROW COLUMN CELL rk0001 column=info:age, timestamp=1523017710918, value=22 rk0001 column=info:name, timestamp=1523017557754, value=tom rk0002 column=info:gender, timestamp=1523018418851, value=male rk0002 column=info:name, timestamp=1523018366607, value=jerry4 2 row(s) in 0.0630 seconds

现在我们来验证一下我们在建表时给列族设定的VERSIONS =>3是否有效,我们向rk0001的iinfo:age列继续添加两次数据。info:age的值分别是21和22。

hbase(main):030:0> put ‘student’,’rk0001′,’info:age’,’22’ 0 row(s) in 0.0210 seconds

hbase(main):031:0> put ‘student’,’rk0001′,’info:age’,’21’ 0 row(s) in 0.0190 seconds

hbase(main):032:0> scan ‘student’ ROW COLUMN CELL rk0001 column=info:age, timestamp=1523018602020, value=21 rk0001 column=info:name, timestamp=1523017557754, value=tom rk0002 column=info:gender, timestamp=1523018418851, value=male rk0002 column=info:name, timestamp=1523018366607, value=jerry4

2 row(s) in 0.0440 seconds


那么我们会有个疑问,我们前面插入的info:age的值为20和21的数据被删除了吗?其实没有。我们可以通过scan ‘student’, {COLUMNS => ‘info’, VERSIONS => 3}来查看,COLUMNS => ‘info’指定的是列族,VERSIONS => 3是建这个列族时指定的可以容纳版本的数量,执行结果如下所示,我们发现info:age的所有值我们都查询出来了。

hbase(main):034:0> scan ‘student’,{COLUMNS=>’info’,VERSIONS=>3} ROW COLUMN CELL rk0001 column=info:age, timestamp=1523018602020, value=21 rk0001 column=info:age, timestamp=1523018594743, value=22 rk0001 column=info:age, timestamp=1523017710918, value=22 rk0001 column=info:name, timestamp=1523017557754, value=tom rk0002 column=info:gender, timestamp=1523018418851, value=male rk0002 column=info:name, timestamp=1523018366607, value=jerry4 rk0002 column=info:name, timestamp=1523018359429, value=jerry3 rk0002 column=info:name, timestamp=1523018350603, value=jerry2

2 row(s) in 0.0740 seconds


hbase(main):035:0> put ‘student’,’rk0001′,’info:age’,’23’ 0 row(s) in 0.0110 seconds

hbase(main):036:0> scan ‘student’,{COLUMNS=>’info’,VERSIONS=>3} ROW COLUMN CELL rk0001 column=info:age, timestamp=1523019548097, value=23 rk0001 column=info:age, timestamp=1523018602020, value=21 rk0001 column=info:age, timestamp=1523018594743, value=22 rk0001 column=info:name, timestamp=1523017557754, value=tom rk0002 column=info:gender, timestamp=1523018418851, value=male rk0002 column=info:name, timestamp=1523018366607, value=jerry4 rk0002 column=info:name, timestamp=1523018359429, value=jerry3 rk0002 column=info:name, timestamp=1523018350603, value=jerry2

2 row(s) in 0.1070 seconds


我们使用scan ‘student’, {RAW => true, VERSIONS => 10}这条命令来查询包括缓存中已被标记为删除的记录。如下所示。直到缓存中的数据被flush之后才不再显示。

hbase(main):001:0> scan ‘student’,{RAW=>true,VERSIONS=>10} ROW COLUMN CELL rk0001 column=info:age, timestamp=1523019548097, value=23 rk0001 column=info:age, timestamp=1523018602020, value=21 rk0001 column=info:age, timestamp=1523018594743, value=22 rk0001 column=info:age, timestamp=1523017710918, value=22 rk0001 column=info:name, timestamp=1523017557754, value=tom rk0002 column=data:score, timestamp=1523017667855, type=DeleteCol umn rk0002 column=data:score, timestamp=1523017667855, value=99 rk0002 column=info:gender, timestamp=1523018418851, value=male rk0002 column=info:name, timestamp=1523018366607, value=jerry4 rk0002 column=info:name, timestamp=1523018359429, value=jerry3 rk0002 column=info:name, timestamp=1523018350603, value=jerry2 rk0002 column=info:name, timestamp=1523018339854, value=jerry

