1、获取fsimage信息,对于超级小的集群,或者是文件数较少的集群可以用命令获取。
代码语言:javascript复制[root@whx nn]# su - hdfs
[hdfs@whx ~]$ hdfs dfsadmin -fetchImage /data01/dfs/nn/
21/03/10 14:53:32 INFO namenode.TransferFsImage: Opening connection to http://b10.whx.com:50070/imagetransfer?getimage=1&txid=latest
21/03/10 14:53:32 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds
21/03/10 14:53:32 INFO namenode.TransferFsImage: Transfer took 0.04s at 885.71 KB/s
对于生产中较大的集群我们就不弄用上面的方法获取fsimage,选择去nn目录下的current目录下的fsimage最新的即可,缺点是并不及时。
代码语言:javascript复制-rw-r--r-- 1 hdfs hdfs 95271 3月 11 06:58 fsimage_0000000000000123318
-rw-r--r-- 1 hdfs hdfs 62 3月 11 06:58 fsimage_0000000000000123318.md5
-rw-r--r-- 1 hdfs hdfs 101002 3月 11 07:58 fsimage_0000000000000124989
-rw-r--r-- 1 hdfs hdfs 62 3月 11 07:58 fsimage_0000000000000124989.md5
-rw-r--r-- 1 hdfs hdfs 7 3月 11 08:22 seen_txid
-rw-r--r-- 1 hdfs hdfs 172 3月 3 14:39 VERSION
[root@whx current]# pwd
/data01/dfs/nn/current
2、进行格式化fsimage变为可读性文件
代码语言:javascript复制[hdfs@whx nn]$ hdfs oiv -i /data01/dfs/nn/fsimage_0000000000000095032 -o /data01/dfs/nn/fsimage.csv -p Delimited -delimiter ","
21/03/10 14:56:04 INFO offlineImageViewer.PBImageTextWriter: Loading string table
21/03/10 14:56:04 INFO offlineImageViewer.FSImageHandler: Loading 5 strings
21/03/10 14:56:04 INFO offlineImageViewer.PBImageTextWriter: Loading inode references
21/03/10 14:56:04 INFO offlineImageViewer.FSImageHandler: Loading inode references
21/03/10 14:56:04 INFO offlineImageViewer.FSImageHandler: Loaded 0 inode references
21/03/10 14:56:04 INFO offlineImageViewer.PBImageTextWriter: Loading directories
21/03/10 14:56:04 INFO offlineImageViewer.PBImageTextWriter: Loading directories in INode section.
21/03/10 14:56:04 INFO offlineImageViewer.PBImageTextWriter: Found 46 directories in INode section.
21/03/10 14:56:04 INFO offlineImageViewer.PBImageTextWriter: Finished loading directories in 22ms
21/03/10 14:56:04 INFO offlineImageViewer.PBImageTextWriter: Loading INode directory section.
21/03/10 14:56:04 INFO offlineImageViewer.PBImageTextWriter: Scanned 38 INode directories to build namespace.
21/03/10 14:56:04 INFO offlineImageViewer.PBImageTextWriter: Finished loading INode directory section in 3ms
21/03/10 14:56:04 INFO offlineImageViewer.PBImageTextWriter: Found 441 INodes in the INode section
21/03/10 14:56:04 INFO offlineImageViewer.PBImageTextWriter: Outputted 441 INodes.
上面这个步骤对于较小的fsimage可以,但是当fsimage较大时,在hadoop客户端解析的时候会报full gc。需要我们改变HADOOP_HEAPSIZE
可以在执行上面的命令前执行如下
代码语言:javascript复制[root@whx current]# export HADOOP_HEAPSIZE=15240
参数的意义调大hadoop执行内存,单位为M
3、将fsimage.csv文件转化一下。删除fsimage的首行表头。查看列头信息。
代码语言:javascript复制[hdfs@whx nn]$ ls
current fsimage_0000000000000095032 fsimage.csv in_use.lock
[hdfs@whx nn]$ sed -i -e "1d" /data01/dfs/nn/fsimage.csv
[root@whx nn]# head -n1 fsimage.csv
Path,Replication,ModificationTime,AccessTime,PreferredBlockSize,BlocksCount,FileSize,NSQUOTA,DSQUOTA,Permission,UserName,GroupName
4、进入hive建立fsimage解析表
代码语言:javascript复制[hdfs@whx nn]$ hive
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/jars/hive-common-1.1.0-cdh5.12.1.jar!/hive-log4j.properties
WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive> show databases;
OK
default
Time taken: 1.71 seconds, Fetched: 1 row(s)
hive> create database mydb;
OK
Time taken: 0.132 seconds
hive> use mydb;
OK
Time taken: 0.029 seconds
代码语言:javascript复制hive> CREATE TABLE `fsimage_info_csv`(
> `path` string,
> `replication` int,
> `modificationtime` string,
> `accesstime` string,
> `preferredblocksize` bigint,
> `blockscount` int,
> `filesize` bigint,
> `nsquota` string,
> `dsquota` string,
> `permission` string,
> `username` string,
> `groupname` string)
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> WITH SERDEPROPERTIES (
> 'field.delim'=',',
> 'serialization.format'=',')
> STORED AS INPUTFORMAT
> 'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
> 'hdfs://whx/user/hive/warehouse/fsimage_info_csv';
OK
Time taken: 0.177 seconds
建表语句:基本注意点表明fsimage_info_csv
代码语言:javascript复制CREATE TABLE `fsimage_info_csv`(
`path` string,
`replication` int,
`modificationtime` string,
`accesstime` string,
`preferredblocksize` bigint,
`blockscount` int,
`filesize` bigint,
`nsquota` string,
`dsquota` string,
`permission` string,
`username` string,
`groupname` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'=',',
'serialization.format'=',')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://whx/user/hive/warehouse/fsimage_info_csv'
上传fsimage表信息,遇到了权限问题,赋权解决。
代码语言:javascript复制[hdfs@whx ~]$ hdfs dfs -put /data01/dfs/nn/fsimage.csv /user/hive/warehouse/fsimage_info_csv01/
put: /data01/dfs/nn/fsimage.csv (Permission denied)
[hdfs@whx ~]$ logout
您在 /var/spool/mail/root 中有新邮件
[root@whx nn]# chown -R hdfs:hdfs /data01/
[root@whx nn]# su - hdfs
上一次登录:四 3月 11 08:46:47 CST 2021pts/0 上
[hdfs@whx ~]$ hdfs dfs -put /data01/dfs/nn/fsimage.csv /user/hive/warehouse/fsimage_info_csv01/
[hdfs@whx ~]$
查hdfs的3级目录语句,可以在hive中执行,注意修改表名
代码语言:javascript复制