團隊名稱、團隊成員介紹、任務分配,團隊成員課程設計博客鏈接
姓名 | 成員介紹 | 任務分配 | 課程設計博客地址 |
---|---|---|---|
謝曉淞(組長) | 團隊輸出主力 | 爬蟲功能實現,Web前端設計及其后端銜接 | 爬蟲:https://www.cnblogs.com/Rasang/p/12169420.html |
前端設計:https://www.cnblogs.com/Rasang/p/12169449.html | |||
康友煌 | 團隊智力擔當 | Elasticsearch后台功能實現 | https://www.cnblogs.com/xycm/p/12168554.html |
閆栩寧 | 團隊顏值擔當 | GUI版搜索引擎實現 | https://www.cnblogs.com/20000519yxn/p/12169954.html |
項目簡介,涉及技術
基於學院網站的搜索引擎,能夠對學院網站的文章進行全文檢索,時間范圍檢索,具有關鍵詞聯想功能。
項目前身:https://www.cnblogs.com/dycsy/p/8351584.html (借鑒16學長們的課設進行重做)
涉及技術:
- Jsoup
- HttpClient
- HTML+CSS
- Javascript/Jquery
- Elasticsearch
- IK analyzer
- Java Swing
- JSP
- 多線程
- Web
- Linux
項目git地址
https://github.com/rasang/searchEngine
項目git提交記錄截圖
前期調查
搜索主頁界面
搜索結果界面
項目功能架構圖、主要功能流程圖
面向對象設計類圖
爬蟲類圖
Elasticsearch類圖
GUI類圖
項目運行截圖或屏幕錄制
項目關鍵代碼:模塊名稱-文字說明-關鍵代碼
爬蟲模塊
SingleCrawler.java : 對單個頁面進行爬取,並使用linksWriter進行數據保存
for (Element link : links) {
String newHref = link.attr("href");
String httpPattern = "^http";
Pattern p = Pattern.compile(httpPattern);
Matcher m = p.matcher(newHref);
if(m.find()){
continue;
}
String newUrl = null;
/**
* 判斷href是相對路徑還是決定路徑,以及是否是傳參
*/
if(newHref.length()>=1 && newHref.charAt(0)=='?') {
newUrl = this.url.substring(0, this.url.indexOf('?')) + newHref;
}
else if(newHref.length()>=1 && newHref.charAt(0)=='/') {
Matcher matcher = httpRegexPattern.matcher(this.url);
if(matcher.find()) {
String rootUrl = matcher.group(0);
newUrl = rootUrl + newHref.substring(1);
}
else {
continue;
}
}
else if(newHref.length()>=1 && newHref.charAt(0)!='/'){
Matcher matcher = httpRegexPattern.matcher(this.url);
if(matcher.find()) {
String rootUrl = matcher.group(0);
newUrl = rootUrl + newHref;
}
else {
continue;
}
}
else {
continue;
}
this.linksWriter.write(newUrl, doc);
}
UrlCollector.java :
爬取菜單URL
String url = "http://cec.jmu.edu.cn/";
String cssSelector = "a[href~=\\.jsp\\?urltype=tree\\.TreeTempUrl&wbtreeid=[0-9]+]";
List<String> menu = null;
List<String> list = null;
PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager();
CloseableHttpClient httpClient = HttpClients.custom().setConnectionManager(cm).build();
LinksListWriter tempListWriter = new LinksListWriter();
/**
* 1. 第一層是Menu,先把Menu的href爬取下來
*/
SingleCrawler menuCrawler = new SingleCrawler(url, cssSelector, httpClient, tempListWriter);
menuCrawler.start();
try {
menuCrawler.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
menu = new ArrayList<String>(tempListWriter.getLinks());
爬取文章URL
SingleCrawler[] listCrawler = new SingleCrawler[menu.size()];
for (int i = 0; i < listCrawler.length; i++) {
String menuUrl = menu.get(i);
if(menuUrl.contains("?")) {
istCrawler[i] = new SingleCrawler(menu.get(i)+"&a3c=1000000&a2c=10000000", "a[href~=^info/[0-9]+/[0-9]+\\.htm]", httpClient, tempListWriter);
listCrawler[i].start();
}
else {
listCrawler[i] = null;
}
}
爬取文章內容
SingleCrawler[] documentCrawler = new SingleCrawler[list.size()];
for (int i = 0; i < documentCrawler.length; i++) {
documentCrawler[i] = new SingleCrawler(list.get(i), "", httpClient, resultWriter);
documentCrawler[i].start();
}
Web前端
自動補全JS代碼,監聽search-input的input標簽,當用戶輸入時會自動異步請求suggest.jsp頁面,然后將返回的結果通過autocomplete呈現
$(function() {
$(".search-input").keyup(function(event) {
var jsonData = "";
$.ajax({
type : "get",
url : "suggest.jsp?term=" + document.getElementById("input").value,
datatype : "json",
async : true,
error : function() {
console.error("Load recommand data failed!");
},
success : function(data) {
data = JSON.parse(data);
$(".search-input").autocomplete({
source : data
});
}
});
})
});
JS實現翻頁功能,主要是獲取點擊事件的元素,然后判斷innerHTML的值:
function turnPage(e) {
page = e.innerHTML;
if (page == "«") {
var currentPage = GetQueryString("page");
//無page傳參
if (currentPage == null) {
AddParamVal("page", 1);
}
//有page傳參
else {
currentPage = parseInt(currentPage) - 1;
if (currentPage <= 0) currentPage = 1;
replaceParamVal("page", currentPage);
}
} else if (page == "»") {
var currentPage = GetQueryString("page");
//無page傳參
if (currentPage == null) {
AddParamVal("page", 2);
}
//有page傳參
else {
currentPage = parseInt(currentPage) + 1;
//if(currentPage<=0) currentPage=1;
replaceParamVal("page", currentPage);
}
} else {
var currentPage = GetQueryString("page");
//無page傳參
if (currentPage == null) {
AddParamVal("page", parseInt(e.innerHTML));
}
//有page傳參
else {
replaceParamVal("page", parseInt(e.innerHTML));
}
}
}
設置JQuery日期選擇插件
$(function(){
if(GetQueryString("timeLimit")!=null){
var option=GetQueryString("timeLimit");
document.getElementById("test6").setAttribute("placeholder",option);
}
})
獲得關鍵詞並根據條件進行檢索
String keyword = request.getParameter("keyword");
if(keyword!=null){
int pageCount = request.getParameter("page")==null?1:Integer.parseInt(request.getParameter("page"));
search = new EsSearch();
search.inseartSearch(keyword);
String timeLimit = request.getParameter("timeLimit");
if(timeLimit==null){
result = search.fullTextSerch(keyword,pageCount);
}
else{
String[] time = timeLimit.split(" - ");
result = search.rangeSerch(keyword, pageCount, time[0], time[1]);
}
}
打印搜索結果
if(result!=null){
for(int i=0;i<result.size();i++){
out.println("<div class=\"result-container\">");
out.println("<a href=\""+result.get(i).getUrl()+"\" target=\"_blank\" class=\"title\">"+result.get(i).getTitle()+"</a>");
int index = result.get(i).getText().indexOf("<span style=\"color:red;\">");
out.println("<div class=\"text\">"+result.get(i).getText()+"</div>");
out.println("<div style=\"float: left;\" class=\"url\">"+result.get(i).getUrl()+"</div>");
out.println("<div style=\"float: left;color: grey;margin-left: 30px;margin-top: 4px;\">"+result.get(i).getTime()+"</div>");
out.println("</div>");
out.println("<div class=\"clear\"></div>");
}
}
Elasticsearch模塊
jest API連接elasticsearch
public static JestClient getJestClient() {
if(jestClient==null) {
JestClientFactory factory = new JestClientFactory();
factory.setHttpClientConfig(new HttpClientConfig
.Builder("http://127.0.0.1:9200")
//.gson(new GsonBuilder().setDateFormat("yyyy-MM-dd HH:mm").create())
.multiThreaded(true)
.readTimeout(10000)
.build());
jestClient=factory.getObject();
}
return jestClient;
}
索引的創建與mapping的寫入
public EsCreatIndex() {
jestClient =EsClient.getJestClient();
try {
jestClient.execute(new CreateIndex.Builder(EsClient.indexName).build());
jestClient.execute(new CreateIndex.Builder(EsClient.suggestName).build());
} catch (IOException e) {
e.printStackTrace();
}
this.createIndexMapping();
}
private void createIndexMapping() {
String sourceIndex="{\"" + EsClient.typeName + "\":{\"properties\":{"
+"\"title\":{\"type\":\"text\",\"analyzer\":\"ik_max_word\",\"search_analyzer\":\"ik_smart\""
+ ",\"fields\":{\"suggest\":{\"type\":\"completion\",\"analyzer\":\"ik_max_word\",\"search_analyzer\":\"ik_smart\"}}}"
+",\"text\":{\"type\":\"text\",\"index\":\"true\",\"analyzer\":\"ik_max_word\",\"search_analyzer\":\"ik_smart\"}"
+",\"url\":{\"type\":\"keyword\"}"
+",\"time\":{\"type\":\"date\"}"
+ "}}}";
PutMapping putMappingIndex=new PutMapping.Builder(EsClient.indexName, EsClient.typeName, sourceIndex).build();
String sourceSuggest="{\"" + EsClient.typeName + "\":{\"properties\":{"
+"\"text\":{\"type\":\"completion\",\"analyzer\":\"ik_max_word\",\"search_analyzer\":\"ik_smart\"}"
+ "}}}";
PutMapping putMappingSuggest=new PutMapping.Builder(EsClient.suggestName, EsClient.typeName, sourceSuggest).build();
try {
jestClient.execute(putMappingIndex);
jestClient.execute(putMappingSuggest);
} catch (IOException e) {
e.printStackTrace();
}
}
插入數據
public void bulkIndex(List<SearchResultEntry> list) throws IOException {
Bulk.Builder bulk=new Bulk.Builder();
for(SearchResultEntry e:list) {
Index index=new Index.Builder(e).index(EsClient.indexName).type(EsClient.typeName).build();
bulk.addAction(index);
}
jestClient.execute(bulk.build());
}
public void insertIndex(SearchResultEntry webpage) throws IOException {
Index index=new Index.Builder(webpage).index(EsClient.indexName).type(EsClient.typeName).build();
jestClient.execute(index);
}
刪除數據
public void deleteIndex() throws IOException {
jestClient.execute(new DeleteIndex.Builder(EsClient.indexName).build());
}
根據頁數進行全文檢索
public List<SearchResultEntry> fullTextSerch(String queryString,int page) {
//聲明一個搜索請求體
SearchSourceBuilder searchSourceBuilder=new SearchSourceBuilder();
BoolQueryBuilder boolQueryBuilder=QueryBuilders.boolQuery();
boolQueryBuilder.must(QueryBuilders.queryStringQuery(queryString));
if(this.isDateRangeQuery) {
QueryBuilder queryBuilder = QueryBuilders
.rangeQuery("time")
.gte(this.startDate)
.lte(this.closingDate)
.includeLower(true)
.includeUpper(true);
/**區間查詢*/
boolQueryBuilder=boolQueryBuilder.filter(queryBuilder);
}
searchSourceBuilder.query(boolQueryBuilder);
//設置高亮字段
HighlightBuilder highlightBuilder = new HighlightBuilder();
highlightBuilder.field("title");
highlightBuilder.field("text");
highlightBuilder.preTags("<span style=\"color:red;\">").postTags("</span>");
highlightBuilder.fragmentSize(200);
searchSourceBuilder.highlighter(highlightBuilder);
//設置分頁
searchSourceBuilder.from((page-1)*10);
searchSourceBuilder.size(10);
//構建Search對象
Search search=new Search.Builder(searchSourceBuilder.toString())
.addIndex(EsClient.indexName)
.addType(EsClient.typeName)
.build();
SearchResult searchResult = null;
try {
searchResult = jestClient.execute(search);
} catch (IOException e) {
e.printStackTrace();
}
resultNum=searchResult.getTotal();
return this.storageList(searchResult);
}
根據日期進行檢索
public List<SearchResultEntry> rangeSerch(String queryString,int page,String startDate,String closingDate){
this.isDateRangeQuery=true;
this.startDate=startDate;
this.closingDate=closingDate;
List<SearchResultEntry> list=this.fullTextSerch(queryString,page);
this.isDateRangeQuery=false;
return list;
}
GUI模塊
根據頁數刷新頁面
private void displayResult(){
resultJpanel.removeAll();
resultJpanel.setLayout(new GridLayout(2, 1));
resultJpanel.add(resultList.get(currentPage*2-2));
if(currentPage+currentPage <= resultNum){
resultJpanel.add(resultList.get(currentPage*2-1));
}
resultJpanel.revalidate();
resultJpanel.repaint();
page.setText(currentPage+"/"+pageNum);
}
顯示出結果后,點擊跳轉按鈕可以用默認瀏覽器打開原網頁
public void actionPerformed(ActionEvent e) {
if(Desktop.isDesktopSupported()){
try {
URI uri=URI.create(url);
Desktop dp=Desktop.getDesktop();
if(dp.isSupported(Desktop.Action.BROWSE)){
dp.browse(uri);
}
} catch (Exception o) {
o.printStackTrace();
}
}
}
用List保存結果頁面
private List<JPanel> getJpanelList(List<SearchResultEntry> list) {
List<JPanel> resultList = new ArrayList<>();
for(SearchResultEntry e:list){
JPanel jPanel=new SearchLook(e);
resultList.add(jPanel);
}
return resultList;
}
項目代碼掃描結果及改正
掃描結果
1.對if-else添加完整的大括號,更正前:
更正后:
2.沒有添加作者,更正前:
更正后:
3.覆蓋方法沒有進行注解,更正前:
更正后:
項目總結
又是一年的課程設計,通過這次課程設計,學到了很多知識,Elasticsearch,JS,GUI,無一不強化了我們的編程技能。可惜的是關鍵詞模糊推薦的功能還不是很好用,沒能優化好這個功能。Web頁面沒有對手機進行適應,GUI也略有不足,希望將來如果有學弟學妹繼續這個項目的時候能優化這些缺憾。