java爬蟲簡單實例

本文轉載自查看原文 2017-06-02 18:31 5205 練習demo

爬蟲的實質就是打開網頁源代碼進行匹配查找，然后獲取查找到的結果。
/*
* 獲取
* 將正則規則進行對象的封裝。 
* Pattern p = Pattern.compile("a*b");
* //通過正則對象的matcher方法字符串相關聯。獲取要對字符串操作的匹配器對象Matcher .
* Matcher m = p.matcher("aaaaab");
* //通過Matcher匹配器對象的方法對字符串進行操作。
* boolean b = m.matches();
*/
package com.js.ai.modules.pointwall.testxfz;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Spider {
public static List<String> getMailsByWeb() throws IOException{
	//1,讀取源文件。
	URL url=new URL("http://www.cnblogs.com/Renyi-Fan/p/6896901.html");
	BufferedReader bufr=new BufferedReader(new InputStreamReader(url.openStream()));
	//2,對讀取的數據進行規則的匹配。從中獲取符合規則的數據.
	String mail_regex = "\\w+@\\w+(\\.\\w+)+";
	List<String> list=new ArrayList<String>();
	Pattern p = Pattern.compile(mail_regex);
	String line = null;
	while((line=bufr.readLine())!=null){
		Matcher m = p.matcher(line);
		while(m.find()){
			//3,將符合規則的數據存儲到集合中。
			list.add(m.group());
		}
	}
	return list;	
}
public static List<String> getMails() throws IOException{
	//1,讀取源文件。
	BufferedReader bufr=new BufferedReader(new FileReader("c:\\mail.html"));
	//2,對讀取的數據進行規則的匹配。從中獲取符合規則的數據.
	   String mail_regex = "\\w+@\\w+(\\.\\w+)+";
	   List<String> list = new ArrayList<String>();
	   Pattern p = Pattern.compile(mail_regex);
	   String line = null;
	   while((line=bufr.readLine())!=null){
	     Matcher m = p.matcher(line);
	     while(m.find()){
	       //3,將符合規則的數據存儲到集合中。
	       list.add(m.group());
	     }
	   }	
	return list;	
}
public static void main(String[] args) throws IOException {
//  List<String> list = getMails();
//  for(String mail : list){
//    System.out.println(mail);
//  }
	List<String> list=getMailsByWeb();
	for(String mail:list){
		System.out.println(mail);
	}
}
}

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 簡單的python爬蟲實例 python小實例一：簡單爬蟲 Java 網絡爬蟲，就是這么的簡單關於java爬蟲以及一些實例 Java實例——基於jsoup的簡單爬蟲實現（從智聯獲取工作信息） Python3 爬蟲實例（一）-- 簡單網頁抓取 java Kafka 簡單應用實例 Java的Socket通信簡單實例 Java—基礎之extends用法詳解及簡單實例 JAVA入門[18]-JdbcTemplate簡單實例