在lucene中添加文档是通过IndexWriter.addDocument方法,我们先给出添加文档的示例代码
代码语言:javascript复制 IndexWriterConfig config = new IndexWriterConfig(new WhitespaceAnalyzer());
config.setUseCompoundFile(false);
config.setMaxBufferedDocs(2);
IndexWriter writer = new IndexWriter(dir, config);
//
FieldType type = new FieldType();
type.setStored(true);
type.setTokenized(true);
type.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
type.setStoreTermVectors(true);
type.setStoreTermVectorPositions(true);
type.setStoreTermVectorOffsets(true);
type.freeze();
//
Document doc = new Document();
doc.add(new Field("content", "one", type));
writer.addDocument(doc);
上一篇文章中介绍了lucene添加、修改文档的流程,在这一篇文章中,介绍处理文档后的流程。
1. DocumentsWriter.postUpdate源码解析
代码语言:javascript复制 private boolean postUpdate(DocumentsWriterPerThread flushingDWPT, boolean hasEvents) throws IOException, AbortingException {
// 应用所有的删除信息到已经存在的segment
hasEvents |= applyAllDeletes(deleteQueue);
if (flushingDWPT != null) {
hasEvents |= doFlush(flushingDWPT);
} else if (config.checkPendingFlushOnUpdate) {
final DocumentsWriterPerThread nextPendingFlush = flushControl.nextPendingFlush();
if (nextPendingFlush != null) {
hasEvents |= doFlush(nextPendingFlush);
}
}
return hasEvents;
}
private boolean applyAllDeletes(DocumentsWriterDeleteQueue deleteQueue) throws IOException {
// flushDeletes标志在deleteQueue占用的内存超过IndexWriterConfig.getRAMBufferSizeMB()
// 时候被设置为true
if (flushControl.getAndResetApplyAllDeletes()) {
// 应用全局的删除信息
if (deleteQueue != null) {
ticketQueue.addDeletes(deleteQueue);
}
// 添加全局删除信息事件
putEvent(ApplyDeletesEvent.INSTANCE); // apply deletes event forces a purge
return true;
}
return false;
}
2. 处理事件
代码语言:javascript复制 // 处理事件,比如应用删除信息到已经存在的segment,应用删除信息到刚flush的segment
private void processEvents(Queue<Event> queue, boolean triggerMerge, boolean forcePurge) throws IOException {
if (tragedy == null) {
Event event;
// 依次取出队列中的事件,处理每一个事件
while ((event = queue.poll()) != null) {
event.process(this, triggerMerge, forcePurge);
}
}
}
到此为止,我们已经介绍完lucene添加、更新文档的大体流程,并没有展开介绍具体的流程。在以后的文章中,会详细介绍lucene的文件格式,以及查询原理。