hbase 过滤数据

hbase 支持百万列、十亿行，非常适合用来存储海量数据。有时需要从这些海量数据中找出某条数据进行数据验证，这就用到了 hbase 过滤器，本文简单介绍几种常用的过滤方法。

初次登录 hbase 时，包含了默认的命名空间（schema），这里新建一个命名空间 test

代码语言：javascript复制

create_namespace 'test'

查看命名空间

代码语言：javascript复制

list_namespace

新建 student 表

代码语言：javascript复制

create 'test:student', 'infomation'

查看表

代码语言：javascript复制

list

查看指定命名空间的表

代码语言：javascript复制

list_namespace_tables 'test'

插入数据

代码语言：javascript复制

put 'test:student', '001','infomation:name_','Alex'
put 'test:student', '001','infomation:age__','13'
put 'test:student', '001','infomation:sex__','Male'
put 'test:student', '001','infomation:class','3.1'

put 'test:student', '002','infomation:name_','Bob'
put 'test:student', '002','infomation:age__','13'
put 'test:student', '002','infomation:sex__','Male'
put 'test:student', '002','infomation:class','3.2'

put 'test:student', '003','infomation:name_','Cindy'
put 'test:student', '003','infomation:age__','13'
put 'test:student', '003','infomation:sex__','Female'
put 'test:student', '003','infomation:class','3.3'

put 'test:student', '004','infomation:name_','Dama'
put 'test:student', '004','infomation:age__','13'
put 'test:student', '004','infomation:sex__','Female'
put 'test:student', '004','infomation:class','3.4'

put 'test:student', '005','infomation:name_','Ella'
put 'test:student', '005','infomation:age__','13'
put 'test:student', '005','infomation:sex__','Female'
put 'test:student', '005','infomation:class','3.5'

按照主键过滤（行过滤）

代码语言：javascript复制

hbase:231:0> scan 'test:student', FILTER => "RowFilter(=,'substring:003')"
ROW                                                COLUMN CELL
 003                                               column=infomation:age__, timestamp=2022-03-13T14:45:00.238, value=13
 003                                               column=infomation:class, timestamp=2022-03-13T14:45:00.259, value=3.3
 003                                               column=infomation:name_, timestamp=2022-03-13T14:45:00.227, value=Cindy
 003                                               column=infomation:sex__, timestamp=2022-03-13T14:45:00.249, value=Female
1 row(s)
Took 0.0105 seconds

按照主键前缀过滤

代码语言：javascript复制

hbase:233:0> scan 'test:student',{FILTER=>"PrefixFilter('00')"}
ROW                                                COLUMN CELL
 001                                               column=infomation:age__, timestamp=2022-03-13T14:45:00.118, value=13
 001                                               column=infomation:class, timestamp=2022-03-13T14:45:00.149, value=3.1
 001                                               column=infomation:name_, timestamp=2022-03-13T14:45:00.106, value=Alex
 001                                               column=infomation:sex__, timestamp=2022-03-13T14:45:00.132, value=Male
 002                                               column=infomation:age__, timestamp=2022-03-13T14:45:00.186, value=13
 002                                               column=infomation:class, timestamp=2022-03-13T14:45:00.210, value=3.2
 002                                               column=infomation:name_, timestamp=2022-03-13T14:45:00.171, value=Bob
 002                                               column=infomation:sex__, timestamp=2022-03-13T14:45:00.197, value=Male
 003                                               column=infomation:age__, timestamp=2022-03-13T14:45:00.238, value=13
 003                                               column=infomation:class, timestamp=2022-03-13T14:45:00.259, value=3.3
 003                                               column=infomation:name_, timestamp=2022-03-13T14:45:00.227, value=Cindy
 003                                               column=infomation:sex__, timestamp=2022-03-13T14:45:00.249, value=Female
 004                                               column=infomation:age__, timestamp=2022-03-13T14:45:00.285, value=13
 004                                               column=infomation:class, timestamp=2022-03-13T14:45:00.309, value=3.5
 004                                               column=infomation:name_, timestamp=2022-03-13T14:45:00.275, value=Dama
 004                                               column=infomation:sex__, timestamp=2022-03-13T14:45:00.298, value=Female
 005                                               column=infomation:age__, timestamp=2022-03-13T14:45:00.336, value=13
 005                                               column=infomation:class, timestamp=2022-03-13T14:45:01.882, value=3.5
 005                                               column=infomation:name_, timestamp=2022-03-13T14:45:00.324, value=Ella
 005                                               column=infomation:sex__, timestamp=2022-03-13T14:45:00.349, value=Female
5 row(s)
Took 0.0110 seconds

按照列前缀过滤

代码语言：javascript复制

hbase:235:0> scan 'test:student',{FILTER=>"ColumnPrefixFilter('a')"}
ROW                                                COLUMN CELL
 001                                               column=infomation:age__, timestamp=2022-03-13T14:45:00.118, value=13
 002                                               column=infomation:age__, timestamp=2022-03-13T14:45:00.186, value=13
 003                                               column=infomation:age__, timestamp=2022-03-13T14:45:00.238, value=13
 004                                               column=infomation:age__, timestamp=2022-03-13T14:45:00.285, value=13
 005                                               column=infomation:age__, timestamp=2022-03-13T14:45:00.336, value=13
5 row(s)
Took 0.0100 seconds

按照主键范围过滤

代码语言：javascript复制

hbase:236:0> scan 'test:student',{STARTROW=>'001',STOPROW=>'003'}
ROW                                                COLUMN CELL
 001                                               column=infomation:age__, timestamp=2022-03-13T14:45:00.118, value=13
 001                                               column=infomation:class, timestamp=2022-03-13T14:45:00.149, value=3.1
 001                                               column=infomation:name_, timestamp=2022-03-13T14:45:00.106, value=Alex
 001                                               column=infomation:sex__, timestamp=2022-03-13T14:45:00.132, value=Male
 002                                               column=infomation:age__, timestamp=2022-03-13T14:45:00.186, value=13
 002                                               column=infomation:class, timestamp=2022-03-13T14:45:00.210, value=3.2
 002                                               column=infomation:name_, timestamp=2022-03-13T14:45:00.171, value=Bob
 002                                               column=infomation:sex__, timestamp=2022-03-13T14:45:00.197, value=Male
2 row(s)
Took 0.0082 seconds

按照 主键范围列前缀 过滤

代码语言：javascript复制

hbase:237:0> scan 'test:student',{STARTROW=>'001',STOPROW=>'003',FILTER=>"ColumnPrefixFilter('a')"}
ROW                                                COLUMN CELL
 001                                               column=infomation:age__, timestamp=2022-03-13T14:45:00.118, value=13
 002                                               column=infomation:age__, timestamp=2022-03-13T14:45:00.186, value=13
2 row(s)
Took 0.0075 seconds

按照“主键列” 过滤

代码语言：javascript复制

hbase:253:0> scan 'test:student', {FILTER => "(RowFilter(=,'substring:003')) AND (ColumnPrefixFilter('age'))" }
ROW                                                COLUMN CELL
 003                                               column=infomation:age__, timestamp=2022-03-13T14:45:00.238, value=13
1 row(s)
Took 0.0090 seconds

按照“主键范围列值”过滤

代码语言：javascript复制

hbase:254:0> scan 'test:student',{STARTROW=>'001',STOPROW=>'003',FILTER=>"ValueFilter(=,'binary:13')"}
ROW                                                COLUMN CELL
 001                                               column=infomation:age__, timestamp=2022-03-13T14:45:00.118, value=13
 002                                               column=infomation:age__, timestamp=2022-03-13T14:45:00.186, value=13
2 row(s)
Took 0.0433 seconds

通过上述几种方法，基本上可以满足 hbase 数据过滤的需求，如果还有没覆盖到的，欢迎留言~~

hbase TDSQLMySQL版

0 人点赞