一、設置請求頭消息 User-Agent模擬瀏覽器
1.當使用第一節的代碼 來 訪問推酷的時候,會返回給我們如下信息:
網頁內容:<!DOCTYPE html> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> <p>系統檢測親不是真人行為,因系統資源限制,我們只能拒絕你的請求。如果你有疑問,可以通過微博 http://weibo.com/tuicool2012/ 聯系我們。</p> </body> </html>
這是因為網站做了限制,限制別人爬。解決方式可以設置請求頭消息 User-Agent模擬瀏覽器。代碼如下:
/** * 抓取網頁信息使用 get請求 * @param args * @throws IOException * @throws ClientProtocolException */ public static void main(String[] args) throws ClientProtocolException, IOException { // 創建httpClient實例 CloseableHttpClient httpClient = HttpClients.createDefault(); // 創建httpGet實例 HttpGet httpGet = new HttpGet("http://www.tuicool.com"); httpGet.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"); CloseableHttpResponse response = httpClient.execute(httpGet); if(response != null){ HttpEntity entity = response.getEntity(); // 獲取網頁內容 String result = EntityUtils.toString(entity, "UTF-8"); System.out.println("網頁內容:" + result); } if(response != null){ response.close(); } if(httpClient != null){ httpClient.close(); } }
給HttpGet方法設置頭消息,即可模擬瀏覽器訪問。
二、獲取響應內容Content-Type
使用 entity.getContentType().getValue() 來獲取Content-Type,代碼如下:
public static void main(String[] args) throws ClientProtocolException, IOException { // 創建httpClient實例 CloseableHttpClient httpClient = HttpClients.createDefault(); // 創建httpGet實例 HttpGet httpGet = new HttpGet("http://www.tuicool.com"); httpGet.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"); CloseableHttpResponse response = httpClient.execute(httpGet); if(response != null){ HttpEntity entity = response.getEntity(); // 獲取網頁內容 System.out.println("Content-Type:" + entity.getContentType().getValue()); // 獲取Content-Type } if(response != null){ response.close(); } if(httpClient != null){ httpClient.close(); } }
三、獲取響應狀態
200 -- 正常
403 -- 拒絕
500 -- 服務器報錯
400 -- 未找到頁面
使用 response.getStatusLine().getStatusCode() 獲取響應狀態,代碼如下:
public static void main(String[] args) throws ClientProtocolException, IOException { // 創建httpClient實例 CloseableHttpClient httpClient = HttpClients.createDefault(); // 創建httpGet實例 HttpGet httpGet = new HttpGet("http://www.tuicool.com"); httpGet.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"); CloseableHttpResponse response = httpClient.execute(httpGet); if(response != null){ int state = response.getStatusLine().getStatusCode(); System.out.println("響應狀態:" + state); } if(response != null){ response.close(); } if(httpClient != null){ httpClient.close(); } }
四、HttpClient學習地址