如何使用HttpClient和Java语言编写微博采集程序

2023-10-17 14:30:45 浏览数 (1)

  微博是我们日常常用的一种社交平台,我们不仅能够在微博上进行各种社交互动,还能够利用微博的时效性,在第一时间了解天下大事。今天我们就来学习一下,如何使用HttpClient和Java语言编写一个微博内容的采集程序,并附上示例代码,一起学习一下吧。

代码语言:javascript复制
```java
import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.Proxy;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class WeiboCrawler {
private static final String USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36";
private static final String PROXY_URL = "https://www.duoip.cn/get_proxy";
public static void main(String[] args) {
List weiboUrls = new ArrayList<>();
// 添加需要爬取的微博URL
weiboUrls.add("https://www.weibo.com/u/6722282128");
ExecutorService executorService = Executors.newFixedThreadPool(10);
for (String url : weiboUrls) {
executorService.submit(new CrawlerTask(url));
}
executorService.shutdown();
}
}
class CrawlerTask implements Runnable {
private String url;
public CrawlerTask(String url) {
this.url = url;
}
@Override
public void run() {
try {
// 获取代理服务器
String proxyIp = getProxyIp();
System.out.println("使用代理IP:"   proxyIp);
// 创建HttpClient实例
HttpClient httpClient = new HttpClient();
// 设置代理
httpClient.setProxy(new Proxy(Proxy.Type.HTTP, new URL(proxyIp)));
// 设置User-Agent
httpClient.setUserAgent(WeiboCrawler.USER_AGENT);
// 发送HTTP请求
HttpURLConnection connection = httpClient.getURL(new URL(url)).getConnection();
connection.setConnectTimeout(5000);
connection.setReadTimeout(5000);
// 获取响应内容
String responseContent = httpClient.getContent(connection);
// 处理响应内容(例如,解析JSON或HTML)
// ...
// 释放资源
connection.disconnect();
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
// 从https://www.duoip.cn/get_proxy获取代理服务器
public static String getProxyIp() {
try {
URL proxyUrl = new URL(PROXY_URL);
HttpURLConnection connection = (HttpURLConnection) proxyUrl.openConnection();
connection.setConnectTimeout(5000);
connection.setReadTimeout(5000);
String ip = connection.getContent(connection).trim();
return ip;
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
```

  以上这些内容,看上去确实比较简单,但是我们在实际编写代码的时候,根据自己需要的情况,细节方面还需要多加修改,才能达到一个尽善尽美的效果。希望这篇文章能对大家学习java语言有所帮助。

0 人点赞