給你的網站加上站內搜索---Compass入門教程
syxChina(syxchina.cnblogs.com)
1 序言
這些天一直在學點新的東西,想給畢業設計添加點含量,長時間的SSH項目也想嘗試下新的東西和完善以前的技術,搜索毋容置疑是很重要的。作為javaer,作為apache的頂級開源項目lucene應該有所耳聞吧,剛學完lucene,知道了基本使用,學的程度應該到可以使用的地步,但不的不說lucene官方給的文檔例子不是很給力的,還好互聯網上資料比較豐富!在搜索lucene的過程中,知道了基於lucene的compass和lucene-nutch。lucene可以對給定內容加上索引搜索,但比如搜索本地數據庫和web網頁,你需要把數據給拿出來索引再搜索,所以你就想可不可以直接搜索數據庫,以數據庫內容作為索引,並且伴隨着數據庫的CRUD,索引也會更新,compass出現了,compass作為站內搜索那是相當的方便的,並且官方提供了spring和hibernate的支持,更是方便了。Lucene-nutch是基於lucene搜索web頁面的,如果有必要我在分享下lucene、lecene-nutch的學習經驗,快速入門,其他的可以交給文檔和谷歌了。
不得不提下,compass09年貌似就不更新了,網上說只支持lucene3.0以下版本,蠻好的項目不知道為什么不更新了,試了下3.0以后的分詞器是不能使用了,我中文使用JE-Analyzer.jar。我使用的環境:
Spring3.1.0+Hibernate3.6.6+Compass2.2.0。
2 Compass介紹
Compass是一個強大的,事務的,高性能的對象/搜索引擎映射(OSEM:object/search engine mapping)與一個Java持久層框架.Compass包括:
* 搜索引擎抽象層(使用Lucene搜索引薦),
* OSEM (Object/Search Engine Mapping) 支持,
* 事務管理,
* 類似於Google的簡單關鍵字查詢語言,
* 可擴展與模塊化的框架,
* 簡單的API.
官方網站:谷歌
3 單獨使用Compass
Compass可以不繼承到hibernate和spring中的,這個是從網上摘錄的,直接上代碼:
@Searchable
public class Book {
private String id;//編號
private String title;//標題
private String author;//作者
private float price;//價格
public Book() {
}
public Book(String id, String title, String author, float price) {
super();
this.id = id;
this.title = title;
this.author = author;
this.price = price;
}
@SearchableId
public String getId() {
return id;
}
@SearchableProperty(boost = 2.0F, index = Index.TOKENIZED, store = Store.YES)
public String getTitle() {
return title;
}
@SearchableProperty(index = Index.TOKENIZED, store = Store.YES)
public String getAuthor() {
return author;
}
@SearchableProperty(index = Index.NO, store = Store.YES)
public float getPrice() {
return price;
}
public void setId(String id) {
this.id = id;
}
public void setTitle(String title) {
this.title = title;
}
public void setAuthor(String author) {
this.author = author;
}
public void setPrice(float price) {
this.price = price;
}
@Override
public String toString() {
return "[" + id + "] " + title + " - " + author + " $ " + price;
}
}
public class Searcher {
protected Compass compass;
public Searcher() {
}
public Searcher(String path) {
compass = new CompassAnnotationsConfiguration()//
.setConnection(path).addClass(Book.class)//
.setSetting("compass.engine.highlighter.default.formatter.simple.pre", "<font color='red'>")//
.setSetting("compass.engine.highlighter.default.formatter.simple.post", "</font>")//
.buildCompass();//
Runtime.getRuntime().addShutdownHook(new Thread() {
public void run() {
compass.close();
}
});
}
/**
* 新建索引
* @param book
*/
public void index(Book book) {
CompassSession session = null;
CompassTransaction tx = null;
try {
session = compass.openSession();
tx = session.beginTransaction();
session.create(book);
tx.commit();
} catch (RuntimeException e) {
if (tx != null)
tx.rollback();
throw e;
} finally {
if (session != null) {
session.close();
}
}
}
/**
* 刪除索引
* @param book
*/
public void unIndex(Book book) {
CompassSession session = null;
CompassTransaction tx = null;
try {
session = compass.openSession();
tx = session.beginTransaction();
session.delete(book);
tx.commit();
} catch (RuntimeException e) {
tx.rollback();
throw e;
} finally {
if (session != null) {
session.close();
}
}
}
/**
* 重建索引
* @param book
*/
public void reIndex(Book book) {
unIndex(book);
index(book);
}
/**
* 搜索
* @param queryString
* @return
*/
public List<Book> search(String queryString) {
CompassSession session = null;
CompassTransaction tx = null;
try {
session = compass.openSession();
tx = session.beginTransaction();
CompassHits hits = session.find(queryString);
int n = hits.length();
if (0 == n) {
return Collections.emptyList();
}
List<Book> books = new ArrayList<Book>();
for (int i = 0; i < n; i++) {
books.add((Book) hits.data(i));
}
hits.close();
tx.commit();
return books;
} catch (RuntimeException e) {
tx.rollback();
throw e;
} finally {
if (session != null) {
session.close();
}
}
}
public class Main {
static List<Book> db = new ArrayList<Book>();
static Searcher searcher = new Searcher("index");
public static void main(String[] args) {
add(new Book(UUID.randomUUID().toString(), "Thinking in Java", "Bruce", 109.0f));
add(new Book(UUID.randomUUID().toString(), "Effective Java", "Joshua", 12.4f));
add(new Book(UUID.randomUUID().toString(), "Java Thread Programing", "Paul", 25.8f));
long begin = System.currentTimeMillis();
int count = 30;
for(int i=1; i<count; i++) {
if(i%10 == 0) {
long end = System.currentTimeMillis();
System.err.println(String.format("當時[%d]條,剩[%d]條,已用時間[%ds],估計時間[%ds].", i,count-i,(end-begin)/1000, (int)((count-i)*((end-begin)/(i*1000.0))) ));
}
String uuid = new Date().toString();
add(new Book(uuid, uuid.substring(0, uuid.length()/2), uuid.substring(uuid.length()/2), (float)Math.random()*100));
}
int n;
do {
n = displaySelection();
switch (n) {
case 1:
listBooks();
break;
case 2:
addBook();
break;
case 3:
deleteBook();
break;
case 4:
searchBook();
break;
case 5:
return;
}
} while (n != 0);
}
static int displaySelection() {
System.out.println("\n==select==");
System.out.println("1. List all books");
System.out.println("2. Add book");
System.out.println("3. Delete book");
System.out.println("4. Search book");
System.out.println("5. Exit");
int n = readKey();
if (n >= 1 && n <= 5)
return n;
return 0;
}
/**
* 增加一本書到數據庫和索引中
*
* @param book
*/
private static void add(Book book) {
db.add(book);
searcher.index(book);
}
/**
* 打印出數據庫中的所有書籍列表
*/
public static void listBooks() {
System.out.println("==Database==");
int n = 1;
for (Book book : db) {
System.out.println(n + ")" + book);
n++;
}
}
/**
* 根據用戶錄入,增加一本書到數據庫和索引中
*/
public static void addBook() {
String title = readLine(" Title: ");
String author = readLine(" Author: ");
String price = readLine(" Price: ");
Book book = new Book(UUID.randomUUID().toString(), title, author, Float.valueOf(price));
add(book);
}
/**
* 刪除一本書,同時刪除數據庫,索引庫中的
*/
public static void deleteBook() {
listBooks();
System.out.println("Book index: ");
int n = readKey();
Book book = db.remove(n - 1);
searcher.unIndex(book);
}
/**
* 根據輸入的關鍵字搜索書籍
*/
public static void searchBook() {
String queryString = readLine(" Enter keyword: ");
List<Book> books = searcher.search(queryString);
System.out.println(" ====search results:" + books.size() + "====");
for (Book book : books) {
System.out.println(book);
}
}
public static int readKey() {
BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
try {
int n = reader.read();
n = Integer.parseInt(Character.toString((char) n));
return n;
} catch (Exception e) {
throw new RuntimeException();
}
}
public static String readLine(String propt) {
System.out.println(propt);
BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
try {
return reader.readLine();
} catch (Exception e) {
throw new RuntimeException();
}
}
}
這種方法向數據庫插入數據和加索引速度很慢,下面方法可以提高,注意這上面沒設置分詞器,所以使用默認的,如果是中文的話會分隔為一個一個的。
4 spring+hibernate繼承compass
4-1 jar包
4-2 配置文件
Beans.xml
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:context="http://www.springframework.org/schema/context"
xmlns:aop="http://www.springframework.org/schema/aop" xmlns:tx="http://www.springframework.org/schema/tx"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
http://www.springframework.org/schema/context
http://www.springframework.org/schema/context/spring-context-3.0.xsd
http://www.springframework.org/schema/tx
http://www.springframework.org/schema/tx/spring-tx-3.0.xsd
http://www.springframework.org/schema/aop
http://www.springframework.org/schema/aop/spring-aop-3.0.xsd">
<context:annotation-config />
<context:component-scan base-package="com.syx.compass"></context:component-scan>
<aop:aspectj-autoproxy></aop:aspectj-autoproxy>
<import resource="hibernate-beans.xml"/>
<import resource="compass-beans.xml"/>
</beans>
compass-beans.xml
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="...">
<!--compass主配置 -->
<bean id="compass" class="org.compass.spring.LocalCompassBean">
<property name="compassSettings">
<props>
<prop key="compass.engine.connection">file://compass</prop><!-- 數據索引存儲位置 -->
<prop key="compass.transaction.factory">
org.compass.spring.transaction.SpringSyncTransactionFactory</prop>
<prop key="compass.engine.analyzer.default.type">
jeasy.analysis.MMAnalyzer</prop><!--定義分詞器-->
<prop key="compass.engine.highlighter.default.formatter.simple.pre">
<![CDATA[<font color="red"><b>]]></prop>
<prop key="compass.engine.highlighter.default.formatter.simple.post">
<![CDATA[</b></font>]]></prop>
</props>
</property>
<property name="transactionManager">
<ref bean="txManager" />
</property>
<property name="compassConfiguration" ref="annotationConfiguration" />
<property name="classMappings">
<list>
<value>com.syx.compass.test1.Article</value>
</list>
</property>
</bean>
<bean id="annotationConfiguration"
class="org.compass.annotations.config.CompassAnnotationsConfiguration">
</bean>
<bean id="compassTemplate" class="org.compass.core.CompassTemplate">
<property name="compass" ref="compass" />
</bean>
<!-- 同步更新索引, 數據庫中的數據變化后同步更新索引 -->
<bean id="hibernateGps" class="org.compass.gps.impl.SingleCompassGps"
init-method="start" destroy-method="stop">
<property name="compass">
<ref bean="compass" />
</property>
<property name="gpsDevices">
<list>
<ref bean="hibernateGpsDevice"/>
</list>
</property>
</bean>
<!--hibernate驅動 鏈接compass和hibernate -->
<bean id="hibernateGpsDevice"
class="org.compass.spring.device.hibernate.dep.SpringHibernate3GpsDevice">
<property name="name">
<value>hibernateDevice</value>
</property>
<property name="sessionFactory">
<ref bean="sessionFactory" />
</property>
<property name="mirrorDataChanges">
<value>true</value>
</property>
</bean>
<!-- 定時重建索引(利用quartz)或隨Spring ApplicationContext啟動而重建索引 -->
<bean id="compassIndexBuilder"
class="com.syx.compass.test1.CompassIndexBuilder"
lazy-init="false">
<property name="compassGps" ref="hibernateGps" />
<property name="buildIndex" value="false" />
<property name="lazyTime" value="1" />
</bean>
<!-- 搜索引擎服務類 -->
<bean id="searchService" class=" com.syx.compass.test1.SearchServiceBean">
<property name="compassTemplate">
<ref bean="compassTemplate" />
</property>
</bean>
</beans>
hibernate-beans.xml
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="...">
<!-- DataSource -->
<bean id="dataSource" class="com.mchange.v2.c3p0.ComboPooledDataSource">
<property name="driverClass" value="${jdbc.driverClassName}" />
<property name="jdbcUrl" value="${jdbc.url}" />
<property name="user" value="${jdbc.username}" />
<property name="password" value="${jdbc.password}" />
<property name="autoCommitOnClose" value="true" />
<property name="checkoutTimeout" value="${cpool.checkoutTimeout}" />
<property name="initialPoolSize" value="${cpool.minPoolSize}" />
<property name="minPoolSize" value="${cpool.minPoolSize}" />
<property name="maxPoolSize" value="${cpool.maxPoolSize}" />
<property name="maxIdleTime" value="${cpool.maxIdleTime}" />
<property name="acquireIncrement" value="${cpool.acquireIncrement}" />
<!-- <property name="maxIdleTimeExcessConnections" value="${cpool.maxIdleTimeExcessConnections}"/> -->
</bean>
<bean
class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">
<property name="locations">
<value>classpath:jdbc.properties</value>
</property>
</bean>
<!-- SessionFacotory -->
<bean id="sessionFactory"
class="org.springframework.orm.hibernate3.annotation.AnnotationSessionFactoryBean">
<property name="dataSource" ref="dataSource" />
<property name="annotatedClasses">
<list>
<value>com.syx.compass.model.Article</value>
<value>com.syx.compass.model.Author</value>
<value>com.syx.compass.test1.Article</value>
</list>
</property>
<property name="hibernateProperties">
<props>
<prop key="hibernate.dialect">org.hibernate.dialect.MySQLDialect</prop>
<prop key="hibernate.current_session_context_class">thread</prop>
<prop key="javax.persistence.validation.mode">none</prop>
<prop key="hibernate.show_sql">true</prop>
<prop key="hibernate.format_sql">false</prop>
<prop key="hibernate.hbm2ddl.auto">update</prop>
</props>
</property>
</bean>
<bean id="hibernateTemplate" class="org.springframework.orm.hibernate3.HibernateTemplate">
<property name="sessionFactory" ref="sessionFactory"></property>
</bean>
<bean id="txManager"
class="org.springframework.orm.hibernate3.HibernateTransactionManager">
<property name="sessionFactory" ref="sessionFactory" />
</bean>
</beans>
jdbc.properties
jdbc.driverClassName=com.mysql.jdbc.Driver
jdbc.hostname=localhost
jdbc.url=jdbc:mysql://localhost:3306/compass
jdbc.username=root
jdbc.password=root
cpool.checkoutTimeout=5000
cpool.minPoolSize=1
cpool.maxPoolSize=4
cpool.maxIdleTime=25200
cpool.maxIdleTimeExcessConnections=1800
cpool.acquireIncrement=5
log4j.properties
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.rootLogger=error, stdout
4-3 源代碼
@Searchable(alias = "article")
@Entity(name="_article")
public class Article {
private Long ID; // 標識ID
private String content; // 正文
private String title; // 文章標題
private Date createTime; // 創建時間
public Article(){}
public Article(Long iD, String content, String title, Date createTime) {
ID = iD;
this.content = content;
this.title = title;
this.createTime = createTime;
}
public String toString() {
return String.format("%d,%s,%s,%s", ID, title, content, createTime.toString());
}
@SearchableId
@Id
@GeneratedValue
public Long getID() {
return ID;
}
public void setID(Long id) {
ID = id;
}
@SearchableProperty(index = Index.TOKENIZED, store = Store.YES)
public String getContent() {
return content;
}
public void setContent(String content) {
this.content = content;
}
@SearchableProperty(index = Index.TOKENIZED, store = Store.YES)
public String getTitle() {
return title;
}
public void setTitle(String title) {
this.title = title;
}
@SearchableProperty(index = Index.TOKENIZED, store = Store.YES)
public Date getCreateTime() {
return createTime;
}
public void setCreateTime(Date createTime) {
this.createTime = createTime;
}
}
public class CompassIndexBuilder implements InitializingBean {
// 是否需要建立索引,可被設置為false使本Builder失效.
private boolean buildIndex = false;
// 索引操作線程延時啟動的時間,單位為秒
private int lazyTime = 10;
// Compass封裝
private CompassGps compassGps;
// 索引線程
private Thread indexThread = new Thread() {
@Override
public void run() {
try {
Thread.sleep(lazyTime * 1000);
System.out.println("begin compass index...");
long beginTime = System.currentTimeMillis();
// 重建索引.
// 如果compass實體中定義的索引文件已存在,索引過程中會建立臨時索引,
// 索引完成后再進行覆蓋.
compassGps.index();
long costTime = System.currentTimeMillis() - beginTime;
System.out.println("compss index finished.");
System.out.println("costed " + costTime + " milliseconds");
} catch (InterruptedException e) {
e.printStackTrace();
}
}
};
/**
* 實現<code>InitializingBean</code>接口,在完成注入后調用啟動索引線程.
*/
public void afterPropertiesSet() throws Exception {
if (buildIndex) {
indexThread.setDaemon(true);
indexThread.setName("Compass Indexer");
indexThread.start();
}
}
public void setBuildIndex(boolean buildIndex) {
this.buildIndex = buildIndex;
}
public void setLazyTime(int lazyTime) {
this.lazyTime = lazyTime;
}
public void setCompassGps(CompassGps compassGps) {
this.compassGps = compassGps;
}
}
public class SearchServiceBean {
private CompassTemplate compassTemplate;
/** 索引查詢 * */
public Map find(final String keywords, final String type, final int start, final int end) {
return compassTemplate.execute(new CompassCallback<Map>() {
public Map doInCompass(CompassSession session) throws CompassException {
List result = new ArrayList();
int totalSize = 0;
Map container = new HashMap();
CompassQuery query = session.queryBuilder().queryString(keywords).toQuery();
CompassHits hits = query.setAliases(type).hits();
totalSize = hits.length();
container.put("size", totalSize);
int max = 0;
if (end < hits.length()) {
max = end;
} else {
max = hits.length();
}
if (type.equals("article")) {
for (int i = start; i < max; i++) {
Article article = (Article) hits.data(i);
String title = hits.highlighter(i).fragment("title");
if (title != null) {
article.setTitle(title);
}
String content = hits.highlighter(i).setTextTokenizer(CompassHighlighter.TextTokenizer.AUTO).fragment("content");
if (content != null) {
article.setContent(content);
}
result.add(article);
}
}
container.put("result", result);
return container;
}
});
}
public CompassTemplate getCompassTemplate() {
return compassTemplate;
}
public void setCompassTemplate(CompassTemplate compassTemplate) {
this.compassTemplate = compassTemplate;
}
}
public class MainTest {
public static ClassPathXmlApplicationContext applicationContext;
private static HibernateTemplate hibernateTemplate;
@BeforeClass
public static void init() {
System.out.println("sprint init...");
applicationContext = new ClassPathXmlApplicationContext("beans.xml");
hibernateTemplate = applicationContext.getBean(HibernateTemplate.class);
System.out.println("sprint ok");
}
@Test
public void addData() {
System.out.println("addDate");
//把compass-beans.xml 中 bean id="compassIndexBuilder"
//buildIndex=true lazyTime=1
//會自動的根據數據庫中的數據重新建立索引
try {
Thread.sleep(10000000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
@Test
public void search() {
String keyword = "全文搜索引擎";
SearchServiceBean ssb = applicationContext.getBean(SearchServiceBean.class);
Map map = ssb.find(keyword, "article", 0, 100);//第一次搜索加載詞庫
long begin = System.currentTimeMillis();
map = ssb.find(keyword, "article", 0, 100);//第二次才是搜索用時
long end = System.currentTimeMillis();
System.out.println(String.format(
"搜索:[%s],耗時(ms):%d,記錄數:%d", keyword, end-begin, map.get("size")));
List<Article> list = (List<Article>) map.get("result");
for(Article article : list) {
System.out.println(article);
}
}
4-4 說明
compass-beans.xml中可以設置建立索引的目錄和分詞器,測試的時候我們使用數據庫添加數據,啟動的建立索引,測試速度。
4-5 測試
使用mysql,寫了一個添加數據的函數:
DELIMITER $$
CREATE
FUNCTION `compass`.`addDateSyx`(num int(8))
RETURNS varchar(32)
BEGIN
declare i int(8);
set i = 0;
while ( i < num) DO
insert into _article (title,content, createTime) values (i, num-i, now());
set i = i + 1;
end while;
return "OK";
END$$
DELIMITER ;
4-5-1 10000條重復的中文數據測試
數據庫函數的時候修改下insert:
insert into _article (title,content, createTime) values ('用compass實現站內全文搜索引擎(一)', 'Compass是一個強大的,事務的,高性能的對象/搜索引擎映射(OSEM:object/search engine mapping)與一個Java持久層框架.Compass包括:
* 搜索引擎抽象層(使用Lucene搜索引薦),
* OSEM (Object/Search Engine Mapping) 支持,
* 事務管理,
* 類似於Google的簡單關鍵字查詢語言,
* 可擴展與模塊化的框架,
* 簡單的API.
如果你需要做站內搜索引擎,而且項目里用到了hibernate,那用compass是你的最佳選擇。 ', now());
插入數據:
select addDateSyx1(10000);//hibernate 中的 hibernate.hbm2ddl.auto=update
建立索引:
10000條,8045ms,速度還不錯。
索引大小:
搜索:
的確分詞了,如果使用默認的分詞,中文會每個中文分一個,速度比較快,如果使用JE-Anaylzer 116ms也是可以接受的。
4-5-2 10w條重復的中文數據測試
插入數據:
Mysql 10w大約12s左右。
建立索引:
索引大小和我想象的差不多,就是時間比我像的長多了,但我不想在試了。
搜索:
10w的是數據,243ms還是很不錯的,看來只要索引建好,搜索還是很方便的。
5 總結下吧
Compass用起來還是挺順手的,應該基本需求可以滿足的,不知道蠻好的項目怎么就不更新了,不然hibernate search就不會有的。
因為compass的不更新,所以lucene3.0以后的特性就不能用了,蠻可以的,雖然compass可以自動建索引(當然也可以手動CRUD),但如果封裝下lucene來完成compass應該可以得到比較好的實現,期待同學們出手了。
參考文章:
ITEYE上一篇也不錯,不小心頁面關了...