如何爬取網站代碼


private static String getHtml(String urlInfo) throws Exception {
//讀取目的網頁URL地址,獲取網頁源碼
URL url = new URL(urlInfo);
HttpURLConnection httpUrl = (HttpURLConnection)url.openConnection();
httpUrl.setConnectTimeout(30000);//連接主機的超時時間(單位:毫秒)
httpUrl.setReadTimeout(30000);//從主機讀取數據的超時時間(單位:毫秒)
System.out.println(httpUrl.getContentEncoding());
InputStream is = httpUrl.getInputStream();
if("gzip".equals(httpUrl.getContentEncoding())){
//處理gzip壓縮
is = new GZIPInputStream(is);
}
BufferedReader br = new BufferedReader(new InputStreamReader(is,"gb2312"));
StringBuilder sb = new StringBuilder();
String line;
while ((line = br.readLine()) != null) {
sb.append(line);
}
is.close();
br.close();
return sb.toString().trim();

}


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM