Java團隊課程設計——基於學院的搜索引擎


團隊名稱、團隊成員介紹、任務分配,團隊成員課程設計博客鏈接

姓名 成員介紹 任務分配 課程設計博客地址
謝曉淞(組長) 團隊輸出主力 爬蟲功能實現,Web前端設計及其后端銜接 爬蟲:https://www.cnblogs.com/Rasang/p/12169420.html
前端設計:https://www.cnblogs.com/Rasang/p/12169449.html
康友煌 團隊智力擔當 Elasticsearch后台功能實現 https://www.cnblogs.com/xycm/p/12168554.html
閆栩寧 團隊顏值擔當 GUI版搜索引擎實現 https://www.cnblogs.com/20000519yxn/p/12169954.html

項目簡介,涉及技術

基於學院網站的搜索引擎,能夠對學院網站的文章進行全文檢索,時間范圍檢索,具有關鍵詞聯想功能。
項目前身:https://www.cnblogs.com/dycsy/p/8351584.html (借鑒16學長們的課設進行重做)

涉及技術:

  • Jsoup
  • HttpClient
  • HTML+CSS
  • Javascript/Jquery
  • Elasticsearch
  • IK analyzer
  • Java Swing
  • JSP
  • 多線程
  • Web
  • Linux

項目git地址

https://github.com/rasang/searchEngine

項目git提交記錄截圖

前期調查

搜索主頁界面

搜索結果界面

項目功能架構圖、主要功能流程圖

面向對象設計類圖

爬蟲類圖

Elasticsearch類圖

GUI類圖

項目運行截圖或屏幕錄制

項目關鍵代碼:模塊名稱-文字說明-關鍵代碼

爬蟲模塊

SingleCrawler.java : 對單個頁面進行爬取,並使用linksWriter進行數據保存

for (Element link : links) {
    String newHref = link.attr("href");
    String httpPattern = "^http";
    Pattern p = Pattern.compile(httpPattern);
    Matcher m = p.matcher(newHref);
    if(m.find()){
        continue;
    }
    String newUrl = null;
    /**
     *  判斷href是相對路徑還是決定路徑,以及是否是傳參
     */
    if(newHref.length()>=1 &&  newHref.charAt(0)=='?') {
        newUrl = this.url.substring(0, this.url.indexOf('?')) + newHref;
    }
    else if(newHref.length()>=1 &&  newHref.charAt(0)=='/') {
        Matcher matcher = httpRegexPattern.matcher(this.url);
        if(matcher.find()) {
            String rootUrl = matcher.group(0);
            newUrl = rootUrl + newHref.substring(1);
        }
        else {
            continue;
        }
    }
    else if(newHref.length()>=1 &&  newHref.charAt(0)!='/'){
        Matcher matcher = httpRegexPattern.matcher(this.url);
        if(matcher.find()) {
            String rootUrl = matcher.group(0);
            newUrl = rootUrl + newHref;
        }
        else {
            continue;
        }
    }
    else {
        continue;
    }
    this.linksWriter.write(newUrl, doc);
}

UrlCollector.java :

爬取菜單URL

String url = "http://cec.jmu.edu.cn/";
String cssSelector = "a[href~=\\.jsp\\?urltype=tree\\.TreeTempUrl&wbtreeid=[0-9]+]";
List<String> menu = null;
List<String> list = null;
PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager();
CloseableHttpClient httpClient = HttpClients.custom().setConnectionManager(cm).build();
LinksListWriter tempListWriter = new LinksListWriter();
/**
 * 1. 第一層是Menu,先把Menu的href爬取下來
 */
SingleCrawler menuCrawler = new SingleCrawler(url, cssSelector, httpClient, tempListWriter);
menuCrawler.start();
try {
    menuCrawler.join();
} catch (InterruptedException e) {
    e.printStackTrace();
}
menu = new ArrayList<String>(tempListWriter.getLinks());

爬取文章URL

SingleCrawler[] listCrawler = new SingleCrawler[menu.size()];
for (int i = 0; i < listCrawler.length; i++) {
    String menuUrl = menu.get(i);
    if(menuUrl.contains("?")) {
        istCrawler[i] = new SingleCrawler(menu.get(i)+"&a3c=1000000&a2c=10000000", "a[href~=^info/[0-9]+/[0-9]+\\.htm]", httpClient, tempListWriter);
        listCrawler[i].start();
    }
    else {
        listCrawler[i] = null;
    }
}

爬取文章內容

SingleCrawler[] documentCrawler = new SingleCrawler[list.size()];
for (int i = 0; i < documentCrawler.length; i++) {
    documentCrawler[i] = new SingleCrawler(list.get(i), "", httpClient, resultWriter);
    documentCrawler[i].start();
}

Web前端

自動補全JS代碼,監聽search-input的input標簽,當用戶輸入時會自動異步請求suggest.jsp頁面,然后將返回的結果通過autocomplete呈現

$(function() {
    $(".search-input").keyup(function(event) {
        var jsonData = "";
        $.ajax({
            type : "get",
            url : "suggest.jsp?term=" + document.getElementById("input").value,
            datatype : "json",
            async : true,
            error : function() {
                console.error("Load recommand data failed!");
            },
            success : function(data) {
                data = JSON.parse(data);
                $(".search-input").autocomplete({
                    source : data
                });
            }
        });
})
});

JS實現翻頁功能,主要是獲取點擊事件的元素,然后判斷innerHTML的值:

function turnPage(e) {
  page = e.innerHTML;
  if (page == "«") {
    var currentPage = GetQueryString("page");
    //無page傳參
    if (currentPage == null) {
      AddParamVal("page", 1);
    }
    //有page傳參
    else {
      currentPage = parseInt(currentPage) - 1;
      if (currentPage <= 0) currentPage = 1;
      replaceParamVal("page", currentPage);
    }
  } else if (page == "»") {
    var currentPage = GetQueryString("page");
    //無page傳參
    if (currentPage == null) {
      AddParamVal("page", 2);
    }
    //有page傳參
    else {
      currentPage = parseInt(currentPage) + 1;
      //if(currentPage<=0) currentPage=1;
      replaceParamVal("page", currentPage);
    }
  } else {
    var currentPage = GetQueryString("page");
    //無page傳參
    if (currentPage == null) {
      AddParamVal("page", parseInt(e.innerHTML));
    }
    //有page傳參
    else {
      replaceParamVal("page", parseInt(e.innerHTML));
    }
  }
}

設置JQuery日期選擇插件

$(function(){
    if(GetQueryString("timeLimit")!=null){ 
        var option=GetQueryString("timeLimit");
        document.getElementById("test6").setAttribute("placeholder",option);
    }
})

獲得關鍵詞並根據條件進行檢索

String keyword = request.getParameter("keyword");
if(keyword!=null){
    int pageCount = request.getParameter("page")==null?1:Integer.parseInt(request.getParameter("page"));
    search = new EsSearch();
    search.inseartSearch(keyword);
    String timeLimit = request.getParameter("timeLimit");
    if(timeLimit==null){
        result = search.fullTextSerch(keyword,pageCount);
    }
    else{
        String[] time = timeLimit.split(" - ");
        result = search.rangeSerch(keyword, pageCount, time[0], time[1]);
    }
}

打印搜索結果

if(result!=null){
    for(int i=0;i<result.size();i++){
        out.println("<div class=\"result-container\">");
        out.println("<a href=\""+result.get(i).getUrl()+"\" target=\"_blank\" class=\"title\">"+result.get(i).getTitle()+"</a>");
        int index = result.get(i).getText().indexOf("<span style=\"color:red;\">");
        out.println("<div class=\"text\">"+result.get(i).getText()+"</div>");
        out.println("<div style=\"float: left;\" class=\"url\">"+result.get(i).getUrl()+"</div>");
        out.println("<div style=\"float: left;color: grey;margin-left: 30px;margin-top: 4px;\">"+result.get(i).getTime()+"</div>");
        out.println("</div>");
        out.println("<div class=\"clear\"></div>");
    }
}

Elasticsearch模塊

jest API連接elasticsearch

public static JestClient getJestClient() {
	if(jestClient==null) {
		JestClientFactory factory = new JestClientFactory();  
		factory.setHttpClientConfig(new HttpClientConfig
			.Builder("http://127.0.0.1:9200")
			//.gson(new GsonBuilder().setDateFormat("yyyy-MM-dd HH:mm").create())
			.multiThreaded(true)
			.readTimeout(10000)
			.build());
		jestClient=factory.getObject();
	}
	return jestClient;
}

索引的創建與mapping的寫入

public EsCreatIndex() {
	jestClient =EsClient.getJestClient();
	try {
		jestClient.execute(new CreateIndex.Builder(EsClient.indexName).build());
		jestClient.execute(new CreateIndex.Builder(EsClient.suggestName).build());
	} catch (IOException e) {
		e.printStackTrace();
	}
	this.createIndexMapping();
}


private void createIndexMapping() {
	String sourceIndex="{\"" + EsClient.typeName + "\":{\"properties\":{"
		+"\"title\":{\"type\":\"text\",\"analyzer\":\"ik_max_word\",\"search_analyzer\":\"ik_smart\""
		+ ",\"fields\":{\"suggest\":{\"type\":\"completion\",\"analyzer\":\"ik_max_word\",\"search_analyzer\":\"ik_smart\"}}}"
		+",\"text\":{\"type\":\"text\",\"index\":\"true\",\"analyzer\":\"ik_max_word\",\"search_analyzer\":\"ik_smart\"}"
		+",\"url\":{\"type\":\"keyword\"}"
		+",\"time\":{\"type\":\"date\"}"
		+ "}}}";
	PutMapping putMappingIndex=new PutMapping.Builder(EsClient.indexName, EsClient.typeName, sourceIndex).build();
		
	String sourceSuggest="{\"" + EsClient.typeName + "\":{\"properties\":{"
		+"\"text\":{\"type\":\"completion\",\"analyzer\":\"ik_max_word\",\"search_analyzer\":\"ik_smart\"}"
		+ "}}}";
	PutMapping putMappingSuggest=new PutMapping.Builder(EsClient.suggestName, EsClient.typeName, sourceSuggest).build();
		
	try {
		jestClient.execute(putMappingIndex);
		jestClient.execute(putMappingSuggest);
	} catch (IOException e) {
		e.printStackTrace();
	}
}

插入數據

public void bulkIndex(List<SearchResultEntry> list) throws IOException {
	Bulk.Builder bulk=new Bulk.Builder();
	for(SearchResultEntry e:list) {
		Index index=new Index.Builder(e).index(EsClient.indexName).type(EsClient.typeName).build();
		bulk.addAction(index);
	}
	jestClient.execute(bulk.build());
}

public void insertIndex(SearchResultEntry webpage) throws IOException {
	Index index=new Index.Builder(webpage).index(EsClient.indexName).type(EsClient.typeName).build();
	jestClient.execute(index);
}

刪除數據

public void deleteIndex() throws IOException {
	jestClient.execute(new DeleteIndex.Builder(EsClient.indexName).build());
}

根據頁數進行全文檢索

public List<SearchResultEntry> fullTextSerch(String queryString,int page) {
	//聲明一個搜索請求體
	SearchSourceBuilder searchSourceBuilder=new SearchSourceBuilder();
		
	BoolQueryBuilder boolQueryBuilder=QueryBuilders.boolQuery();
	boolQueryBuilder.must(QueryBuilders.queryStringQuery(queryString));
		
	if(this.isDateRangeQuery) {
		QueryBuilder queryBuilder = QueryBuilders  
			.rangeQuery("time")  
			.gte(this.startDate)  
			.lte(this.closingDate)  
			.includeLower(true)  
			.includeUpper(true);
				/**區間查詢*/ 
		boolQueryBuilder=boolQueryBuilder.filter(queryBuilder);
	}
		
	searchSourceBuilder.query(boolQueryBuilder);
		
	//設置高亮字段
    HighlightBuilder highlightBuilder = new HighlightBuilder();
    highlightBuilder.field("title");
    highlightBuilder.field("text");
    highlightBuilder.preTags("<span style=\"color:red;\">").postTags("</span>");
    highlightBuilder.fragmentSize(200);
    searchSourceBuilder.highlighter(highlightBuilder);
		
	//設置分頁
    searchSourceBuilder.from((page-1)*10);
    searchSourceBuilder.size(10);
        
    //構建Search對象
    Search search=new Search.Builder(searchSourceBuilder.toString())
    	.addIndex(EsClient.indexName)
    	.addType(EsClient.typeName)
    	.build();
    SearchResult searchResult = null;
    try {
        searchResult = jestClient.execute(search);
    } catch (IOException e) {
        e.printStackTrace();
    }
    resultNum=searchResult.getTotal();
    return this.storageList(searchResult);
}

根據日期進行檢索

public List<SearchResultEntry> rangeSerch(String queryString,int page,String startDate,String closingDate){
	this.isDateRangeQuery=true;
	this.startDate=startDate;
	this.closingDate=closingDate;
	List<SearchResultEntry> list=this.fullTextSerch(queryString,page);
	this.isDateRangeQuery=false;
	return list;
}

GUI模塊

根據頁數刷新頁面

private void displayResult(){
       resultJpanel.removeAll();
       resultJpanel.setLayout(new GridLayout(2, 1));
       resultJpanel.add(resultList.get(currentPage*2-2));
       if(currentPage+currentPage <= resultNum){
           resultJpanel.add(resultList.get(currentPage*2-1));
       }
       resultJpanel.revalidate();
       resultJpanel.repaint();
       page.setText(currentPage+"/"+pageNum);
   }

顯示出結果后,點擊跳轉按鈕可以用默認瀏覽器打開原網頁

public void actionPerformed(ActionEvent e) {
    if(Desktop.isDesktopSupported()){
        try {
            URI uri=URI.create(url);
            Desktop dp=Desktop.getDesktop();
            if(dp.isSupported(Desktop.Action.BROWSE)){
                dp.browse(uri);
            }
        } catch (Exception o) {
            o.printStackTrace();
        }
    }
}

用List保存結果頁面

private List<JPanel> getJpanelList(List<SearchResultEntry> list) { 
       List<JPanel> resultList = new ArrayList<>();
       for(SearchResultEntry e:list){
           JPanel jPanel=new SearchLook(e);
           resultList.add(jPanel);
       }
       return resultList;
   }

項目代碼掃描結果及改正

掃描結果

1.對if-else添加完整的大括號,更正前:

更正后:

2.沒有添加作者,更正前:

更正后:

3.覆蓋方法沒有進行注解,更正前:

更正后:

項目總結

又是一年的課程設計,通過這次課程設計,學到了很多知識,Elasticsearch,JS,GUI,無一不強化了我們的編程技能。可惜的是關鍵詞模糊推薦的功能還不是很好用,沒能優化好這個功能。Web頁面沒有對手機進行適應,GUI也略有不足,希望將來如果有學弟學妹繼續這個項目的時候能優化這些缺憾。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM