java | 아이군의 블로그

자바를 하면서 가장 중독성 있었던것을 꼽아보라고 하면 단연 수많은 오픈소스 프레임워크가 아닐까 생각해 본다.

검색엔진에 대한 아무런 지식이 없는 상태인 나는 검색엔진 구현을 해야 하는 상황에서 어떻게 해야 할까 고민하다가 무턱대고 Lucene in Action을 덜컥 구매해 버렸다.

그리고 보다보니깐 참 이해 안되고, 책 내용도 자꾸 웹과는 거리가 있는 콘솔상에서의 구현에 집중되어있고 인터넷을 검색해 봐도 다 이해할수 없는 예제였고, 아무튼 그렇게 루씬을 공부했다.

이제 꽤나 루씬을 이해하고 있고, 이제 내가 아는 지식을 공개해야 할때인듯 하다. 물론 내가 아는 지식도 매우 기초적인것이라 나역시 초보를 위한 강좌밖에 못할듯 하다.

현존하는 루씬의 강좌나 자료들이 2.0이전 버젼들을 대상으로 제작되어있고, 그 이전버젼들은 한글에 대한 Analyzer버그가 있다.

고로 나는 처음부터 2.0으로 해왔다. 그래서인지 Lucene in Action책을 보고 따라하기엔 다른것이 많아 지식 습득에 조금 문제가 있었다. 앞으로 쓰는 모든 글은 2.0으로 제작할것이며, 한글 문제가 없는 버젼이다.

우선 간단하게 공부해 볼수 있는 루씬 클래스입니다. 공부에 참고만 하세요^^

[CODE]package ke.pe.theeye.search;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.Field.Index;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class SearchEngine {
private static SearchEngine instance;
private Analyzer standardAnalyzer;
private Analyzer whitespaceAnalyzer;
private Analyzer simpleAnalyzer;
private Analyzer currentAnalyzer;
private String directory;

private static final int STANDARD = 0;
private static final int WHITESPACE = 1;
private static final int SIMPLE = 2;

public static SearchEngine getInstance() {
if (instance == null) {
synchronized (SearchEngine.class) {
if (instance == null)
instance = new SearchEngine();
}
}

return instance;
}

private SearchEngine() {
instance = null;
currentAnalyzer = null;
standardAnalyzer = new StandardAnalyzer();
whitespaceAnalyzer = new WhitespaceAnalyzer();
simpleAnalyzer = new SimpleAnalyzer();
}

public boolean setIndexDirectory(String directory) throws IOException {
if (this.currentAnalyzer == null) {
return false;
}

this.directory = directory;

Directory fsDir = null;
IndexWriter writer = null;
try {
fsDir = FSDirectory.getDirectory(directory, false);
if (!fsDir.fileExists(“segments”)) {
writer = new IndexWriter(this.directory, currentAnalyzer, true);
}
} finally {
if (writer != null)
try {
writer.close();
} catch (IOException ex) {
};
if (fsDir != null)
try {
fsDir.close();
} catch (IOException ex) {
}
}

return true;
}

public boolean setAnalyzer(int type) {
if (type == STANDARD) {
this.currentAnalyzer = this.standardAnalyzer;
} else if (type == WHITESPACE) {
this.currentAnalyzer = this.whitespaceAnalyzer;
} else if (type == SIMPLE) {
this.currentAnalyzer = this.simpleAnalyzer;
} else {
return false;
}

return true;
}

public boolean makeIndex(String word) throws IOException {
if (this.currentAnalyzer == null) {
return false;
}

IndexWriter writer = null;

try {
writer = new IndexWriter(this.directory, this.currentAnalyzer,
false);

Document document = new Document();
document.add(new Field(“word”, word, Store.YES, Index.TOKENIZED));
writer.addDocument(document);
} finally {
if (writer != null)
try {
writer.close();
} catch (IOException ex) {
};
}
return true;
}

public List searchIndex(String queryString) throws IOException,
ParseException {
Directory fsDir = null;
IndexSearcher is = null;

try {
fsDir = FSDirectory.getDirectory(this.directory, false);
is = new IndexSearcher(fsDir);

QueryParser parser = new QueryParser(“word”, this.currentAnalyzer);

Query query = parser.parse(queryString);
Hits hits = is.search(query);

List<String> list = new ArrayList<String>();

if (hits.length() > 0) {
for (int i = 0; i < hits.length(); i++) {
Document doc = hits.doc(i);
list.add(doc.get(“word”));
}
}

return list;
} finally {
if (is != null)
try {
is.close();
} catch (IOException ex) {
}
if (fsDir != null)
try {
fsDir.close();
} catch (IOException ex) {
}
}
}
}[/CODE]

사용은 다음과 같이 한다.

초기화 :
[CODE]SearchEngine engine = SearchEngine.getInstance();
// 0 : Standard Analyzer, 1 : Whitespace Analyzer, 2 : Simple Analyzer
engine.setAnalyzer(0);
setIndexDirectory(“C:\LuceneIndex”);[/CODE]

인덱스 생성 :
[CODE]engine.makeIndex(“아이군의 홈페이지 주소는 theeye.pe.kr이다”);[/CODE]

검색시 :
[CODE]// 검색결과 객체들이 list로 담겨 나옴, 알아서 재사용
List list = engine.searchIndex(“아이군”); [/CODE]

위와같다. 이 글을 보는 분들은 루씬에 대해 어느정도 사전지식이 있는 분일것이다. 이것을 이해하는것은 어렵지 않았을것이라 생각한다.

위와같은 예제로(각각의 인덱스는 String형 하나가 아닌, Beans형의 특정 데이터 객체였다) 10만건을 입력해 놓고 검색해 보니 0.1초 이상 걸리지 않았다.

검색엔진 알고리즘에 사전지식이 없는 사람도 이정도로 사용할수 있다는것에 감탄을 표하는 바이다.

각각의 Analyzer차이와 인덱스 및 검색 옵션 여러가지 Term들에 대해서는 앞으로 계속 짬짬이 글을 써보겠다.

게시판에서 쓸만한 페이징 클래스를 제작해 보았다. 문제 발견시 피드백 부탁드립니다^^

1342418818.txt
—————————————————————————————
페이징 클래스 소스코드 :
—————————————————————————————
[code]public class PageNavigation {

private    boolean    isPrevPage;
private    boolean    isNextPage;
protected    int    nowPage;
protected    int    rowTotal;
protected    int    blockList;
protected    int    blockPage;
private    int    totalPage;
private    int    startPage;
private    int    endPage;
private    int    startRow;
private    int    endRow;

// 페이지를 계산하는 생성자
public PageNavigation(int nowPage, int rowTotal, int blockList, int blockPage) {
super();

// 각종 플래그를 초기화
isPrevPage = false;
isNextPage = false;

// 입력된 전체 열의 수를 통해 전체 페이지 수를 계산한다
this.totalPage    = (int) Math.ceil((double)rowTotal / (double)blockList);

// 현재 페이지가 전체 페이지수보다 클경우 전체 페이지수로 강제로 조정한다
if(nowPage > this.totalPage)
{
nowPage = this.totalPage;
}

// DB입력을 위한 시작과 종료값을 구한다
this.startRow    = (int) (nowPage – 1) * blockList;
this.endRow    = (int) this.startRow + blockList – 1;

// 시작페이지와 종료페이지의 값을 구한다
this.startPage    = (int) ((nowPage – 1) / blockPage) * blockPage + 1;
this.endPage    = (int) this.startPage + blockPage – 1;

// 마지막 페이지값이 전체 페이지값보다 클 경우 강제 조정
if(this.endPage > this.totalPage)
{
this.endPage = totalPage;
}

// 시작 페이지가 1보다 클 경우 이전 페이징이 가능한것으로 간주한다
if(this.startPage > 1)
{
this.isPrevPage = true;
}

// 종료페이지가 전체페이지보다 작을경우 다음 페이징이 가능한것으로 간주한다
if(this.endPage < this.totalPage)
{
this.isNextPage = true;
}

// 기타 값을 저장한다
this.nowPage = nowPage;
this.rowTotal = rowTotal;
this.blockList = blockList;
this.blockPage = blockPage;
}

public void Debug()
{
System.out.println(“Total Page : ” + this.totalPage + ” / Start Page : ” + this.startPage + ” / End Page : ” + this.endPage);
System.out.println(“Total Row : ” + this.rowTotal + ” / Start Row : ” + this.startRow + ” / End Row : ” + this.endRow);
}

// 전체 페이지 수를 알아온다
public int getTotalPage()
{
return totalPage;
}

// 시작 Row값을 가져온다
public int getStartRow()
{
return startRow;
}

// 마지막 Row값을 가져온다
public int getEndRow()
{
return endRow;
}

// Block Row 크기를 가져온다
public int getBlockSize()
{
return blockSize;
}

// 시작페이지값을 가져온다
public int getStartPage()
{
return startPage;
}

// 마지막 페이지값을 가져온다
public int getEndPage()
{
return endPage;
}

// 이전페이지의 존재유무를 가져온다
public boolean isPrevPage()
{
return isPrevPage;
}

// 다음페이지의 존재유무를 가져온다
public boolean isNextPage()
{
return isNextPage;
}
}[/code]

—————————————————————————————
서블릿(Controller) 소스코드 :
—————————————————————————————

[code]// 리스트를 가져온다
if(request.getParameter(“page”) == null)
{
nowPage = 1;
}
else
{
nowPage = Integer.parseInt(request.getParameter(“page”));

if(nowPage < 1)
{
nowPage = 1;
}
}

// 객체를 생성한다 (현재페이지, 전체글수, 페이지당표시할 글의수, 한번에 표시할 페이징블록수)
PageNavigation pageNav = new PageNavigation(nowPage, rowTotal, 10, 5);

// 디버깅이 필요할시 사용한다. 안써도 됨
pageNav.Debug();

// 시작Row값과 종료Row값을 넣어 쿼리문을 작성한다
sql = “SELECT * FROM TableName ORDER BY no DESC LIMIT ” + pageNav.getStartRow() + “, ” + pageNav.getBlockSize();

// 뷰에게 넘길 값을 지정한다
request.setAttribute(“pageIsPrev”,    pageNav.isPrevPage());    // 이전페이지 블록의 존재유무
request.setAttribute(“pageIsNext”,    pageNav.isNextPage());    // 다음페이지 블록의 존재유무
request.setAttribute(“pageStart”,    pageNav.getStartPage());// 시작페이지 번호
request.setAttribute(“pageEnd”,    pageNav.getEndPage());    // 종료페이지 번호[/code]

—————————————————————————————
jsp(View) 소스코드(EL표기법, JSTL사용) :
—————————————————————————————
[code]<div>
<center>
<c:if test=”${pageIsPrev}”>
<a href=”index.do?page=${pageStart – 1}”>prev</a>
</c:if>
<c:forEach var=”page” begin=”${pageStart}” end=”${pageEnd}”>
<a href=”index.do?page=${page}”>[${page}] </a>
</c:forEach>
<c:if test=”${pageIsNext}”>
<a href=”index.do?page=${pageEnd + 1}”>next</a>
</c:if>
</center>
</div>[/code]

—————————————————————————————
결과 :
—————————————————————————————
prev [11] [12] [13] [14] [15] next

아이군의 블로그

배움에 망설이는 그순간 당신은 2류

Tag Archives: java

Java Lucene – 루씬을 이용한 JSP용 클래스

게시판용 페이징 클래스