下面的这个例子摘自Lucene in Action (2010版本),上面的示例使用的是Lucene 3.x,现在的Lucene最新版本是4.10.3。由于Lucene2.x和3.x,3.x和4.x的API变化还是挺大的,所以书上面的示例不能在4.x下运行。
下面的示例主要是从一堆文本文件中建立索引,然后根据建立的索引进行搜索的一个过程。
我使用的Lucene版本是4.10.2,其中我把源代码中Indexer和Searcher中的main方法,我使用JUnit测试框架写到了单元测试中(我使用的是JUnit4)。
在你自己的工程中要引入下面的3个jar包:lucene-core-4.10.2.jar,lucene-analyzers-common-4.10.2.jar,lucene-queryparser-4.10.2.jar
首先建立索引,Indexer类主要完成索引的建立。
代码语言:javascript复制package cn.tzy.lucene;
import java.io.File;
import java.io.FileFilter;
import java.io.FileReader;
import java.io.IOException;
import java.text.ParseException;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
/**
* It takes two arguments:
* A path to a directory where we store the Lucene index
* A path to a directory that contains the files we want to index
* @author Zhenyu Tan
*/
public class Indexer {
private IndexWriter writer;
public Indexer(String indexDir) throws IOException, ParseException {
Directory dir = FSDirectory.open(new File(indexDir));
// Create Lucene IndexWriter
IndexWriterConfig config = new IndexWriterConfig(Version.parse("4.0.0"), new StandardAnalyzer());
writer = new IndexWriter(dir, config);
}
public void close() throws IOException {
// Close IndexWriter
writer.close();
}
public int index(String dataDir, FileFilter filter) throws Exception {
File[] files = new File(dataDir).listFiles();
for (File file : files) {
if (!file.isDirectory() && !file.isHidden() && file.exists() && file.canRead() && (filter != null && filter.accept(file))) {
indexFile(file);
}
}
// Return number of documents indexed
return writer.numDocs();
}
private void indexFile(File file) throws Exception {
System.out.println("Indexing " file.getCanonicalPath());;
Document doc = getDocument(file);
// Return number of documents indexed
writer.addDocument(doc);
}
protected Document getDocument(File file) throws Exception {
Document doc = new Document();
// Index file content
doc.add(new TextField("content", new FileReader(file)));
doc.add(new TextField("name", file.getName(), Field.Store.YES));
// Index file path
doc.add(new TextField("path", file.getCanonicalPath(), Field.Store.YES));
return doc;
}
public static class TextFileFilter implements FileFilter {
@Override
public boolean accept(File pathname) {
// Index .xml files only
return pathname.getName().toLowerCase().endsWith(".xml");
}
}
}
然后根据索引进行搜索,Searcher类完成搜索:
代码语言:javascript复制package cn.tzy.lucene;
import java.io.File;
import java.io.IOException;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
public class Searcher {
public static void search(String indexDir, String squery) throws IOException, ParseException {
Directory dir = FSDirectory.open(new File(indexDir));
// Open index
IndexReader reader = DirectoryReader.open(dir);
IndexSearcher searcher = new IndexSearcher(reader);
QueryParser parser = new QueryParser("content", new StandardAnalyzer());
Query query = parser.parse(squery);
long start = System.currentTimeMillis();
TopDocs hits = searcher.search(query, 10);
long end = System.currentTimeMillis();
// Write search status
System.out.println("Found " hits.totalHits
" document(s) (in " (end - start)
" milliseconds) that matched query '"
squery "':");
// Retrieve matching document
for(ScoreDoc scoreDoc : hits.scoreDocs) {
Document doc = searcher.doc(scoreDoc.doc);
System.out.println(doc.get("path"));
}
}
}
下面是测试代码:
代码语言:javascript复制package cn.tzy.lucene.test;
import org.junit.Test;
import cn.tzy.lucene.Indexer;
import cn.tzy.lucene.Searcher;
public class LuceneTest {
// Create Lucene index in this directory
private String indexDir = "index";
// Index *.xml files in this directory
private String dataDir = "document";
@Test
public void indexerTest() throws Exception {
long start = System.currentTimeMillis();
Indexer indexer = new Indexer(indexDir);
int numIndexed;
try {
numIndexed = indexer.index(dataDir, new Indexer.TextFileFilter());
} finally {
indexer.close();
}
long end = System.currentTimeMillis();
System.out.println("Indexing " numIndexed " files took " (end - start) " milliseconds");
}
@Test
public void searcherTest() throws Exception {
String squery = "buffer";
Searcher.search(indexDir, squery);
}
}
indexerTest方法为dataDir文件夹下的文本文件建立索引,然后在indexDir文件夹生成索引文件。运行结果如下:
代码语言:javascript复制Indexing E:EclipseWorkSpaceHelloLucenedocumentAngleService-angleBetween.xml
Indexing E:EclipseWorkSpaceHelloLucenedocumentAngleService-interiorAngle.xml
...
(中间部分省略)
...
Indexing E:EclipseWorkSpaceHelloLucenedocumentSpatialAnalysisServices-measureArea.xml
Indexing E:EclipseWorkSpaceHelloLucenedocumentSpatialAnalysisServices-measureLength.xml
Indexing 137 files took 777 milliseconds
searcherTest方法查询包含buffer的文件,运行结果如下:
代码语言:javascript复制Found 16 document(s) (in 15 milliseconds) that matched query 'buffer':
E:EclipseWorkSpaceHelloLucenedocumentSpatialAnalysisServices-bufferAnalysis.xml
E:EclipseWorkSpaceHelloLucenedocumentRasterBufferProcess.xml
E:EclipseWorkSpaceHelloLucenedocumentGeoBufferProcess.xml
E:EclipseWorkSpaceHelloLucenedocumentRasterGrowProcess.xml
E:EclipseWorkSpaceHelloLucenedocumentGeoRandomProcess.xml
E:EclipseWorkSpaceHelloLucenedocumentRasterColorsProcess.xml
E:EclipseWorkSpaceHelloLucenedocumentRasterLakeProcess.xml
E:EclipseWorkSpaceHelloLucenedocumentRasterParamscaleProcess.xml
E:EclipseWorkSpaceHelloLucenedocumentRasterRandomProcess.xml
E:EclipseWorkSpaceHelloLucenedocumentRasterSurfcontourProcess.xml