HDFS Java API 实践

2021-09-06 10:15:22 浏览数 (1)

文章目录

    • 1. 启动 Hadoop 集群
    • 2. 使用 HDFS Shell
    • 3. 使用 HDFS Web UI
    • 4. 安装 Eclipse IDE
      • 4.1 上传文件
      • 4.2 查询文件位置
      • 4.3 创建目录
      • 4.4 读取文件内容
      • 4.5 写入文件

1. 启动 Hadoop 集群

安装集群:https://cloud.tencent.com/developer/article/1872854

启动命令:

代码语言:javascript复制
start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh start historyserver
# 第三条可以用下面的命令,上面的显示过期了,以后弃用
mapred --daemon start historyserver

2. 使用 HDFS Shell

  • 创建文件夹,创建文件
代码语言:javascript复制
[dnn@master ~]$ mkdir /opt/hadoop-3.3.0/HelloHadoop
[dnn@master ~]$ vim /opt/hadoop-3.3.0/HelloHadoop/file1.txt

文本内容:

代码语言:javascript复制
hello hadoop
i am Michael
代码语言:javascript复制
[dnn@master ~]$ vim /opt/hadoop-3.3.0/HelloHadoop/file2.txt

文本内容:

代码语言:javascript复制
learning BigData
very cool
  • 创建 HDFS 目录 hadoop fs -mkdir -p /InputData, -p 多级目录
  • 检查是否创建
代码语言:javascript复制
[dnn@master ~]$ hadoop fs -ls /
Found 4 items
drwxr-xr-x   - dnn supergroup          0 2021-03-13 06:50 /InputData
drwxr-xr-x   - dnn supergroup          0 2021-03-12 06:53 /InputDataTest
drwxr-xr-x   - dnn supergroup          0 2021-03-12 07:12 /OutputDataTest
drwxrwx---   - dnn supergroup          0 2021-03-12 06:19 /tmp
  • 上传、查看
代码语言:javascript复制
[dnn@master ~]$ hadoop fs -put /opt/hadoop-3.3.0/HelloHadoop/* /InputData
[dnn@master ~]$ hadoop fs -cat /InputData/file1.txt
hello hadoop
i am Michael
[dnn@master ~]$ hadoop fs -cat /InputData/file2.txt
learning BigData
very cool
  • 查看系统整体信息 hdfs dfsadmin -report
代码语言:javascript复制
[dnn@master ~]$ hdfs dfsadmin -report
Configured Capacity: 36477861888 (33.97 GB)
Present Capacity: 23138791499 (21.55 GB)
DFS Remaining: 23136948224 (21.55 GB)
DFS Used: 1843275 (1.76 MB)
DFS Used%: 0.01%
Replicated Blocks:
	Under replicated blocks: 12
	Blocks with corrupt replicas: 0
	Missing blocks: 0
	Missing blocks (with replication factor 1): 0
	Low redundancy blocks with highest priority to recover: 0
	Pending deletion blocks: 0
Erasure Coded Block Groups: 
	Low redundancy block groups: 0
	Block groups with corrupt internal blocks: 0
	Missing block groups: 0
	Low redundancy blocks with highest priority to recover: 0
	Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (2):

Name: 192.168.253.128:9866 (slave1)
Hostname: slave1
Decommission Status : Normal
Configured Capacity: 18238930944 (16.99 GB)
DFS Used: 929792 (908 KB)
Non DFS Used: 6669701120 (6.21 GB)
DFS Remaining: 11568300032 (10.77 GB)
DFS Used%: 0.01%
DFS Remaining%: 63.43%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sat Mar 13 06:57:09 CST 2021
Last Block Report: Sat Mar 13 06:49:24 CST 2021
Num of Blocks: 12


Name: 192.168.253.129:9866 (slave2)
Hostname: slave2
Decommission Status : Normal
Configured Capacity: 18238930944 (16.99 GB)
DFS Used: 913483 (892.07 KB)
Non DFS Used: 6669369269 (6.21 GB)
DFS Remaining: 11568648192 (10.77 GB)
DFS Used%: 0.01%
DFS Remaining%: 63.43%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sat Mar 13 06:57:09 CST 2021
Last Block Report: Sat Mar 13 06:45:42 CST 2021
Num of Blocks: 12

3. 使用 HDFS Web UI

可以看见副本数是 3,Block 大小是 128 Mb

4. 安装 Eclipse IDE

  • 下载地址
  • 安装指导

4.1 上传文件

编写上传文件的代码:

代码语言:javascript复制
/**
 * 
 */
package com.michael.hdfs;
import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

/**
 * @author dnn
 *
 */
public class UploadFile {

	/**
	 * 
	 */
	public UploadFile() {
		// TODO Auto-generated constructor stub
	}

	/**
	 * @param args
	 */
	public static void main(String[] args) throws IOException{
		// TODO Auto-generated method stub
		Configuration conf = new Configuration();
		FileSystem hdfs = FileSystem.get(conf);
		Path scr = new Path("/opt/hadoop-3.3.0/HelloHadoop/file1.txt");
		Path dest = new Path("file1.txt");
		hdfs.copyFromLocalFile(scr, dest);
		System.out.println("Upload to "   conf.get("fs.defaultFS"));
		FileStatus files[] = hdfs.listStatus(dest);
		for(FileStatus file : files) {
			System.out.println(file.getPath());
		}
	}
}

运行:并未拷贝到 hdfs系统内

代码语言:javascript复制
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Upload to file:///
file:/home/dnn/eclipse-workspace/HDFS_example/file1.txt

查看hdfs系统文件,没有file1.txt

代码语言:javascript复制
[dnn@master ~]$ hadoop fs -ls /
Found 4 items
drwxr-xr-x   - dnn supergroup          0 2021-03-13 06:54 /InputData
drwxr-xr-x   - dnn supergroup          0 2021-03-12 06:53 /InputDataTest
drwxr-xr-x   - dnn supergroup          0 2021-03-12 07:12 /OutputDataTest
drwxrwx---   - dnn supergroup          0 2021-03-12 06:19 /tmp

更改:设置默认地址

代码语言:javascript复制
		Configuration conf = new Configuration();
		conf.set("fs.defaultFS", "hdfs://192.168.253.130:9000");//加入这句
		FileSystem hdfs = FileSystem.get(conf);

输出:正确了,上传到 hdfs 里了

代码语言:javascript复制
Upload to hdfs://192.168.253.130:9000
hdfs://192.168.253.130:9000/user/dnn/file1.txt
代码语言:javascript复制
[dnn@master Desktop]$ hadoop fs -ls -R /user
drwxr-xr-x   - dnn supergroup          0 2021-03-16 07:43 /user/dnn
-rw-r--r--   3 dnn supergroup         26 2021-03-16 07:43 /user/dnn/file1.txt
  • 在集群上运行 1 、导出 jar 文件

2、bash输入命令执行

代码语言:javascript复制
[dnn@master Desktop]$ hadoop jar /home/dnn/eclipse-workspace/HDFS_example/hdfs_uploadfile.jar com.michael.hdfs.UploadFile
Upload to hdfs://192.168.253.130:9000
hdfs://192.168.253.130:9000/user/dnn/file1.txt
[dnn@master Desktop]$ hadoop fs -ls -R /user
drwxr-xr-x   - dnn supergroup          0 2021-03-16 07:59 /user/dnn
-rw-r--r--   3 dnn supergroup         26 2021-03-16 07:59 /user/dnn/file1.txt

4.2 查询文件位置

代码语言:javascript复制
package com.michael.hdfs;
import java.io.IOException;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.BlockLocation;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class FileLoc {

	public static void main(String[] args) throws IOException{
		// TODO Auto-generated method stub
		String uri = "hdfs://master:9000/user/dnn/file1.txt";
		Configuration conf = new Configuration();
		try {
			FileSystem fs = FileSystem.get(URI.create(uri), conf);
			Path fpath = new Path(uri);
			FileStatus filestatus = fs.getFileStatus(fpath);
			BlockLocation [] blklocations = fs.getFileBlockLocations(filestatus, 0, filestatus.getLen());
			int blockLen = blklocations.length;
			for(int i = 0; i < blockLen;   i) {
				String [] hosts = blklocations[i].getHosts();
				System.out.println("block"   i   "_location:"   hosts[0]);
			}
		}
		catch(IOException e) {
			e.printStackTrace();
		}
	}
}
代码语言:javascript复制
[dnn@master Desktop]$ hadoop jar /home/dnn/eclipse-workspace/HDFS_example/hdfs_filelocation.jar com.michael.hdfs.FileLoc
block0_location:slave2

4.3 创建目录

代码语言:javascript复制
package com.michael.hdfs;
import java.io.IOException;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;


public class CreatDir {

	public static void main(String[] args) {
		// TODO Auto-generated method stub
		String uri = "hdfs://master:9000";
		Configuration conf = new Configuration();
		try {
			FileSystem fs = FileSystem.get(URI.create(uri), conf);
			Path dfs = new Path("/test");
			boolean flag = fs.mkdirs(dfs);
			System.out.println(flag ? "create success" : "create failure");
		}
		catch(IOException e) {
			e.printStackTrace();
		}
	}
}
代码语言:javascript复制
[dnn@master Desktop]$ hadoop jar /home/dnn/eclipse-workspace/HDFS_example/hdfs_mkdir.jar com.michael.hdfs.CreatDir
create success

4.4 读取文件内容

代码语言:javascript复制
package com.michael.hdfs;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FSDataInputStream;


public class ReadFile {

	public static void main(String[] args) {
		// TODO Auto-generated method stub
		try {
			Configuration conf = new Configuration();
			conf.set("fs.defaultFS", "hdfs://master:9000");
			conf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem");
			FileSystem fs = FileSystem.get(conf);
			Path file = new Path("file1.txt");
			FSDataInputStream getIt = fs.open(file);
			BufferedReader d = new BufferedReader(new InputStreamReader(getIt));
			String content = d.readLine();
			System.out.println(content);
			d.close();
			fs.close();
		}
		catch(IOException e) {
			e.printStackTrace();
		}
	}
}
代码语言:javascript复制
[dnn@master Desktop]$ hadoop jar /home/dnn/eclipse-workspace/HDFS_example/hdfs_readfile.jar com.michael.hdfs.ReadFile
hello hadoop

4.5 写入文件

代码语言:javascript复制
package com.michael.hdfs;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.Path;
import com.michael.hdfs.ReadFile;

public class WriteFile {

	public static void main(String[] args) {
		// TODO Auto-generated method stub
		try {
			Configuration conf = new Configuration();
			conf.set("fs.defaultFS", "hdfs://master:9000");
			conf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem");
			FileSystem fs = FileSystem.get(conf);
			byte[] buffer = "hello Michael !!!".getBytes();
			String filename = "test_file.txt";
			FSDataOutputStream os = fs.create(new Path(filename));
			os.write(buffer, 0 , buffer.length);
			System.out.println("create: "   filename);
			os.close();
			fs.close();
			ReadFile r = new ReadFile();
			r.read(filename);
		}
		catch(IOException e) {
			e.printStackTrace();
		}
	}
}
代码语言:javascript复制
package com.michael.hdfs;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FSDataInputStream;


public class ReadFile {

	public static void main(String[] args) {
		// TODO Auto-generated method stub
		try {
			Configuration conf = new Configuration();
			conf.set("fs.defaultFS", "hdfs://master:9000");
			conf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem");
			FileSystem fs = FileSystem.get(conf);
			Path file = new Path("test_file.txt");
			FSDataInputStream getIt = fs.open(file);
			BufferedReader d = new BufferedReader(new InputStreamReader(getIt));
			String content = d.readLine();
			System.out.println(content);
			d.close();
			fs.close();
		}
		catch(IOException e) {
			e.printStackTrace();
		}
	}

	public void read(String filename) {
		// TODO Auto-generated method stub
		try {
			Configuration conf = new Configuration();
			conf.set("fs.defaultFS", "hdfs://master:9000");
			conf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem");
			FileSystem fs = FileSystem.get(conf);
			Path file = new Path(filename);
			FSDataInputStream getIt = fs.open(file);
			BufferedReader d = new BufferedReader(new InputStreamReader(getIt));
			String content = d.readLine();
			System.out.println(content);
			d.close();
			fs.close();
		}
		catch(IOException e) {
			e.printStackTrace();
		}
	}
}
代码语言:javascript复制
[dnn@master Desktop]$ hadoop jar /home/dnn/eclipse-workspace/HDFS_example/hdfs_writefile.jar com.michael.hdfs.WriteFile
create: test_file.txt
hello Michael !!!

0 人点赞