Lucene入门实例

2019-01-22 17:40:05 浏览数 (1)

下面的这个例子摘自Lucene in Action (2010版本),上面的示例使用的是Lucene 3.x,现在的Lucene最新版本是4.10.3。由于Lucene2.x和3.x,3.x和4.x的API变化还是挺大的,所以书上面的示例不能在4.x下运行。

下面的示例主要是从一堆文本文件中建立索引,然后根据建立的索引进行搜索的一个过程。

我使用的Lucene版本是4.10.2,其中我把源代码中Indexer和Searcher中的main方法,我使用JUnit测试框架写到了单元测试中(我使用的是JUnit4)。

在你自己的工程中要引入下面的3个jar包:lucene-core-4.10.2.jar,lucene-analyzers-common-4.10.2.jar,lucene-queryparser-4.10.2.jar

首先建立索引,Indexer类主要完成索引的建立。

代码语言:javascript复制
package cn.tzy.lucene;

import java.io.File;
import java.io.FileFilter;
import java.io.FileReader;
import java.io.IOException;
import java.text.ParseException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

/**
 * It takes two arguments:
 * A path to a directory where we store the Lucene index
 * A path to a directory that contains the files we want to index
 * @author Zhenyu Tan
 */
public class Indexer {
	
	private IndexWriter writer;
	
	public Indexer(String indexDir) throws IOException, ParseException {
		Directory dir = FSDirectory.open(new File(indexDir));
		// Create Lucene IndexWriter
		IndexWriterConfig config = new IndexWriterConfig(Version.parse("4.0.0"), new StandardAnalyzer());
		writer = new IndexWriter(dir, config);
	}
	
	public void close() throws IOException {
		// Close IndexWriter
		writer.close();
	}
	
	public int index(String dataDir, FileFilter filter) throws Exception {
		File[] files = new File(dataDir).listFiles();
		for (File file : files) {
			if (!file.isDirectory() && !file.isHidden() && file.exists() && file.canRead() && (filter != null && filter.accept(file))) {
				indexFile(file);
			}
		}
		// Return number of documents indexed
		return writer.numDocs();
	}
	
	private void indexFile(File file) throws Exception {
		System.out.println("Indexing "   file.getCanonicalPath());;
		Document doc = getDocument(file);
		// Return number of documents indexed
		writer.addDocument(doc);
	}
	
	protected Document getDocument(File file) throws Exception {
		Document doc = new Document();
		// Index file content
		doc.add(new TextField("content", new FileReader(file)));
		doc.add(new TextField("name", file.getName(), Field.Store.YES));
		// Index file path
		doc.add(new TextField("path", file.getCanonicalPath(), Field.Store.YES));
		return doc;
	}
	
	public static class TextFileFilter implements FileFilter {
		@Override
		public boolean accept(File pathname) {
			// Index .xml files only
			return pathname.getName().toLowerCase().endsWith(".xml");
		}
	}
}

然后根据索引进行搜索,Searcher类完成搜索:

代码语言:javascript复制
package cn.tzy.lucene;

import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class Searcher {
	public static void search(String indexDir, String squery) throws IOException, ParseException {
		Directory dir = FSDirectory.open(new File(indexDir));
		// Open index
		IndexReader reader = DirectoryReader.open(dir);
		IndexSearcher searcher = new IndexSearcher(reader);
		
		QueryParser parser = new QueryParser("content", new StandardAnalyzer());
		Query query = parser.parse(squery);
		long start = System.currentTimeMillis();
		TopDocs hits = searcher.search(query, 10);
		long end = System.currentTimeMillis();
		
		// Write search status
		System.out.println("Found "   hits.totalHits  
				" document(s) (in "   (end - start)  
				" milliseconds) that matched query '"  
				squery   "':");
		
		// Retrieve matching document
		for(ScoreDoc scoreDoc : hits.scoreDocs) {
			Document doc = searcher.doc(scoreDoc.doc);
			System.out.println(doc.get("path"));
		}
	}
}

下面是测试代码:

代码语言:javascript复制
package cn.tzy.lucene.test;

import org.junit.Test;

import cn.tzy.lucene.Indexer;
import cn.tzy.lucene.Searcher;

public class LuceneTest {
	// Create Lucene index in this directory
	private String indexDir = "index";
	// Index *.xml files in this directory
	private String dataDir = "document";
	
	@Test
	public void indexerTest() throws Exception {
		long start = System.currentTimeMillis();
		Indexer indexer = new Indexer(indexDir);
		int numIndexed;
		try {
			numIndexed = indexer.index(dataDir, new Indexer.TextFileFilter());
		} finally {
			indexer.close();
		}
		long end = System.currentTimeMillis();
		System.out.println("Indexing "   numIndexed   " files took "   (end - start)   " milliseconds");
	}
	
	@Test
	public void searcherTest() throws Exception {
		String squery = "buffer";
		Searcher.search(indexDir, squery);
	}
}

indexerTest方法为dataDir文件夹下的文本文件建立索引,然后在indexDir文件夹生成索引文件。运行结果如下:

代码语言:javascript复制
Indexing E:EclipseWorkSpaceHelloLucenedocumentAngleService-angleBetween.xml
Indexing E:EclipseWorkSpaceHelloLucenedocumentAngleService-interiorAngle.xml
...
(中间部分省略)
...
Indexing E:EclipseWorkSpaceHelloLucenedocumentSpatialAnalysisServices-measureArea.xml
Indexing E:EclipseWorkSpaceHelloLucenedocumentSpatialAnalysisServices-measureLength.xml
Indexing 137 files took 777 milliseconds

searcherTest方法查询包含buffer的文件,运行结果如下:

代码语言:javascript复制
Found 16 document(s) (in 15 milliseconds) that matched query 'buffer':
E:EclipseWorkSpaceHelloLucenedocumentSpatialAnalysisServices-bufferAnalysis.xml
E:EclipseWorkSpaceHelloLucenedocumentRasterBufferProcess.xml
E:EclipseWorkSpaceHelloLucenedocumentGeoBufferProcess.xml
E:EclipseWorkSpaceHelloLucenedocumentRasterGrowProcess.xml
E:EclipseWorkSpaceHelloLucenedocumentGeoRandomProcess.xml
E:EclipseWorkSpaceHelloLucenedocumentRasterColorsProcess.xml
E:EclipseWorkSpaceHelloLucenedocumentRasterLakeProcess.xml
E:EclipseWorkSpaceHelloLucenedocumentRasterParamscaleProcess.xml
E:EclipseWorkSpaceHelloLucenedocumentRasterRandomProcess.xml
E:EclipseWorkSpaceHelloLucenedocumentRasterSurfcontourProcess.xml

0 人点赞