MultiTermQuery包含以下query:
FuzzyQuery, NumericRangeQuery, PrefixQuery, TermRangeQuery, WildcardQuery
FuzzyQuery是一种模糊查询,它可以简单地识别两个相近的词语。 即相似度匹配
NumericRangeQuery数字形式的范围查询
PrefixQuery前缀搜索A Query that matches documents containing terms with a specified prefix. A PrefixQuery is built by QueryParser for input like app*.
TermRangeQuery:主要用于文本范围查找;
使用通配符查询,*代表0个或多个字母,?代表0个或1个字母。
Query query=new WildcardQuery(new Term("contents","?ild*"));
Hits hits=searcher.search(query);
使用QueryParser和wildcardQuery使用的是相同的语法。但使用QueryParser时,首个字母不能是通配符
SpanQuery按照词在文章中的距离或者查询几个相邻词的查询
SpanQuery包括以下几种:
SpanTermQuery:词距查询的基础,结果和TermQuery相似,只不过是增加了查询结果中单词的距离信息。
SpanFirstQuery:在指定距离可以找到第一个单词的查询。
SpanNearQuery:查询的几个语句之间保持者一定的距离。
SpanOrQuery:同时查询几个词句查询。
SpanNotQuery:从一个词距查询结果中,去除一个词距查询。
ConstantScoreQuery
A query that wraps a filter and simply returns a constant score equal to the query boost for every document in the filter
看了一下这个类的构造函数ConstantScoreQuery(Filter filter) ,我的理解就是通过构造filter来完成文档的过滤,并且返回一个复合当前过滤条件的文档的常量分数,这个分数等于为查询条件设置的boost
2、自定义评分一、根据文件大小来评分,文件越大,权重越低
- package util;
- import java.io.IOException;
- import org.apache.lucene.index.IndexReader;
- import org.apache.lucene.index.Term;
- import org.apache.lucene.search.IndexSearcher;
- import org.apache.lucene.search.Query;
- import org.apache.lucene.search.TermQuery;
- import org.apache.lucene.search.TopDocs;
- import org.apache.lucene.search.function.CustomScoreProvider;
- import org.apache.lucene.search.function.CustomScoreQuery;
- import org.apache.lucene.search.function.FieldScoreQuery;
- import org.apache.lucene.search.function.ValueSourceQuery;
- import org.apache.lucene.search.function.FieldScoreQuery.Type;
- public class MyScoreQuery1{
- public void searchByScoreQuery() throws Exception{
- IndexSearcher searcher = DocUtil.getSearcher();
- Query query = new TermQuery(new Term("content","java"));
- //1、创建评分域,如果Type是String类型,那么是Type.BYTE
- //该域必须是数值型的,并且不能使用norms索引,以及每个文档中该域只能由一个语汇
- //单元,通常可用Field.Index.not_analyzer_no_norms来进行创建索引
- FieldScoreQuery fieldScoreQuery = new FieldScoreQuery("size",Type.INT);
- //2、根据评分域和原有的Query创建自定义的Query对象
- //query是原有的query,fieldScoreQuery是专门做评分的query
- MyCustomScoreQuery customQuery = new MyCustomScoreQuery(query, fieldScoreQuery);
- TopDocs topdoc = searcher.search(customQuery, 100);
- DocUtil.printDocument(topdoc, searcher);
- searcher.close();
- }
- @SuppressWarnings("serial")
- private class MyCustomScoreQuery extends CustomScoreQuery{
- public MyCustomScoreQuery(Query subQuery, ValueSourceQuery valSrcQuery) {
- super(subQuery, valSrcQuery);
- }
- /**
- * 这里的reader是针对段的,意思是如果索引包含的段不止一个,那么搜索期间会多次调用
- * 这个方法,强调这点是重要的,因为它使你的评分逻辑能够有效使用段reader来对域缓存
- * 中的值进行检索
- */
- @Override
- protected CustomScoreProvider getCustomScoreProvider(IndexReader reader)
- throws IOException {
- //默认情况实现的评分是通过原有的评分*传入进来的评分域所获取的评分来确定最终打分的
- //为了根据不同的需求进行评分,需要自己进行评分的设定
- /**
- * 自定评分的步骤
- * 创建一个类继承于CustomScoreProvider
- * 覆盖customScore方法
- */
- // return super.getCustomScoreProvider(reader);
- return new MyCustomScoreProvider(reader);
- }
- }
- private class MyCustomScoreProvider extends CustomScoreProvider{
- public MyCustomScoreProvider(IndexReader reader) {
- super(reader);
- }
- /**
- * subQueryScore表示默认文档的打分
- * valSrcScore表示的评分域的打分
- * 默认是subQueryScore*valSrcScore返回的
- */
- @Override
- public float customScore(int doc, float subQueryScore, float valSrcScore)throws IOException {
- System.out.println("Doc:"+doc);
- System.out.println("subQueryScore:"+subQueryScore);
- System.out.println("valSrcScore:"+valSrcScore);
- // return super.customScore(doc, subQueryScore, valSrcScore);
- return subQueryScore / valSrcScore;
- }
- }
- }
3、根据特定的几个文件名来评分,选中的文件名权重变大
- package util;
- import java.io.IOException;
- import org.apache.lucene.index.IndexReader;
- import org.apache.lucene.index.Term;
- import org.apache.lucene.search.FieldCache;
- import org.apache.lucene.search.IndexSearcher;
- import org.apache.lucene.search.Query;
- import org.apache.lucene.search.TermQuery;
- import org.apache.lucene.search.TopDocs;
- import org.apache.lucene.search.function.CustomScoreProvider;
- import org.apache.lucene.search.function.CustomScoreQuery;
- /**
- * 此类的功能是给特定的文件名加权,也就是加评分
- * 也可以实现搜索书籍的时候把近一两年的出版的图书给增加权重
- * @author user
- */
- public class MyScoreQuery2 {
- public void searchByFileScoreQuery() throws Exception{
- IndexSearcher searcher = DocUtil.getSearcher();
- Query query = new TermQuery(new Term("content","java"));
- FilenameScoreQuery fieldScoreQuery = new FilenameScoreQuery(query);
- TopDocs topdoc = searcher.search(fieldScoreQuery, 100);
- DocUtil.printDocument(topdoc, searcher);
- searcher.close();
- }
- @SuppressWarnings("serial")
- private class FilenameScoreQuery extends CustomScoreQuery{
- public FilenameScoreQuery(Query subQuery) {
- super(subQuery);
- }
- @Override
- protected CustomScoreProvider getCustomScoreProvider(IndexReader reader)
- throws IOException {
- // return super.getCustomScoreProvider(reader);
- return new FilenameScoreProvider(reader);
- }
- }
- private class FilenameScoreProvider extends CustomScoreProvider{
- String[] filenames = null;
- public FilenameScoreProvider(IndexReader reader) {
- super(reader);
- try {
- filenames = FieldCache.DEFAULT.getStrings(reader, "filename");
- } catch (IOException e) {e.printStackTrace();}
- }
- //如何根据doc获取相应的field的值
- /*
- * 在reader没有关闭之前,所有的数据会存储要一个域缓存中,可以通过域缓存获取很多有用
- * 的信息filenames = FieldCache.DEFAULT.getStrings(reader, "filename");可以获取
- * 所有的filename域的信息
- */
- @Override
- public float customScore(int doc, float subQueryScore, float valSrcScore)
- throws IOException {
- String fileName = filenames[doc];
- System.out.println(doc+":"+fileName);
- // return super.customScore(doc, subQueryScore, valSrcScore);
- if("9.txt".equals(fileName) || "4.txt".equals(fileName)) {
- return subQueryScore*1.5f;
- }
- return subQueryScore/1.5f;
- }
- }
- }
4、测试junit
- package test;
- import org.junit.Test;
- import util.MyScoreQuery1;
- import util.MyScoreQuery2;
- public class TestCustomScore {
- @Test
- public void test01() throws Exception {
- MyScoreQuery1 msq = new MyScoreQuery1();
- msq.searchByScoreQuery();
- }
- @Test
- public void test02() throws Exception {
- MyScoreQuery2 msq = new MyScoreQuery2();
- msq.searchByFileScoreQuery();
- }
- }
5、文档操作的工具类
- package util;
- import java.io.File;
- import java.io.IOException;
- import java.text.SimpleDateFormat;
- import java.util.Date;
- import org.apache.lucene.document.Document;
- import org.apache.lucene.index.CorruptIndexException;
- import org.apache.lucene.index.IndexReader;
- import org.apache.lucene.search.IndexSearcher;
- import org.apache.lucene.search.ScoreDoc;
- import org.apache.lucene.search.TopDocs;
- import org.apache.lucene.store.Directory;
- import org.apache.lucene.store.FSDirectory;
- public class DocUtil {
- private static IndexReader reader;
- //得到indexSearch对象
- public static IndexSearcher getSearcher(){
- try {
- Directory directory = FSDirectory.open(new File("D:\\Workspaces\\customscore\\index"));
- reader = IndexReader.open(directory);
- } catch (CorruptIndexException e) {
- e.printStackTrace();
- } catch (IOException e) {
- e.printStackTrace();
- }
- IndexSearcher searcher = new IndexSearcher(reader);
- return searcher;
- }
- /**
- * 打印文档信息
- * @param topdoc
- */
- public static void printDocument(TopDocs topdoc,IndexSearcher searcher){
- SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd hh:mm:ss");
- for(ScoreDoc scoredoc : topdoc.scoreDocs){
- try {
- Document doc = searcher.doc(scoredoc.doc);
- System.out.println(scoredoc.doc+":("+scoredoc.score+")" +
- "["+doc.get("filename")+"【"+doc.get("path")+"】--->"+
- doc.get("size")+"-----"+sdf.format(new Date(Long.valueOf(doc.get("date"))))+"]");
- } catch (CorruptIndexException e) {
- e.printStackTrace();
- } catch (IOException e) {
- e.printStackTrace();
- }
- }
- }
- }
6、创建索引
- package index;
- import java.io.File;
- import java.io.IOException;
- import org.apache.commons.io.FileUtils;
- import org.apache.lucene.analysis.Analyzer;
- import org.apache.lucene.document.Document;
- import org.apache.lucene.document.Field;
- import org.apache.lucene.document.NumericField;
- import org.apache.lucene.index.CorruptIndexException;
- import org.apache.lucene.index.IndexWriter;
- import org.apache.lucene.index.IndexWriterConfig;
- import org.apache.lucene.store.Directory;
- import org.apache.lucene.store.FSDirectory;
- import org.apache.lucene.store.LockObtainFailedException;
- import org.apache.lucene.util.Version;
- import org.wltea.analyzer.lucene.IKAnalyzer;
- public class FileIndexUtils {
- private static Directory directory = null;
- private static Analyzer analyzer = new IKAnalyzer();
- public static void main(String[] args) {
- index(true);
- }
- static{
- try {
- directory = FSDirectory.open(new File("D:\\Workspaces\\customscore\\index"));
- } catch (IOException e) {
- e.printStackTrace();
- }
- }
- public static Directory getDirectory() {
- return directory;
- }
- public static void index(boolean hasNew) {
- IndexWriter writer = null;
- try {
- writer = new IndexWriter(directory, new IndexWriterConfig(Version.LUCENE_35, analyzer));
- if(hasNew) {
- writer.deleteAll();
- }
- File file = new File("D:\\Workspaces\\customscore\\resource");
- Document doc = null;
- for(File f:file.listFiles()) {
- doc = new Document();
- doc.add(new Field("content",FileUtils.readFileToString(f),Field.Store.YES,Field.Index.ANALYZED));
- doc.add(new Field("filename",f.getName(),Field.Store.YES,Field.Index.ANALYZED));
- doc.add(new Field("classid","5312",Field.Store.YES,Field.Index.ANALYZED));
- doc.add(new Field("path",f.getAbsolutePath(),Field.Store.YES,Field.Index.ANALYZED));
- doc.add(new NumericField("date",Field.Store.YES,true).setLongValue(f.lastModified()));
- doc.add(new NumericField("size",Field.Store.YES,true).setIntValue((int)(f.length())));
- writer.addDocument(doc);
- }
- } catch (CorruptIndexException e) {
- e.printStackTrace();
- } catch (LockObtainFailedException e) {
- e.printStackTrace();
- } catch (IOException e) {
- e.printStackTrace();
- } finally {
- try {
- if(writer!=null) writer.close();
- } catch (CorruptIndexException e) {
- e.printStackTrace();
- } catch (IOException e) {
- e.printStackTrace();
- }
- }
- }
- }
工程下载路径:http://download.csdn.net/detail/wxwzy738/5320772
http://blog.csdn.net/wxwzy738/article/details/8873094
相关推荐
public void add(Query query, BooleanClause.Occur occur) BooleanClause用于表示布尔查询子句关系的类,包括: BooleanClause.Occur.MUST,BooleanClause.Occur.MUST_NOT,BooleanClause.Occur.SHOULD。 有以下6...
c#下实现Lucene时间区间查询匹配。主要还是对Lucene查循对像Query的实现
本文档详细讲解了各种SpanQuery的用法,以及它跟PhraseQuery的区别
lucene3.3的全部jar包ant-1.7.1.jar ant-junit-1.7.1.jar commons-beanutils-1.7.0.jar ...lucene-xml-query-parser-3.3.0.jar maven-ant-tasks-2.1.1.jar xercesImpl-2.9.1-patched-XERCESJ-1257.jar
npm install lucene-query-string-builder --save 特征 创建术语字符串时转义lucene特殊字符 包含所有lucene用途的运算符 简单的lucene.builder函数,用于定义lucene查询构建器 用法 让我们看看如何使用Lucene查询...
NULL 博文链接:https://iamyida.iteye.com/blog/2206107
Lucene3.0之查询处理(1):原理和查询类型 各种Query对象详解
把用户输入的查询字符串封装成Lucene能够识别的Query对象。 3) Filter: 用来过虑搜索结果的对象。 4) TopDocs: 代表查询结果集信息对象。它有两个属性: a) totalHits: 查询命中数。 b) scoreDocs: 查询结果信息...
@Test public void testBooleanQuery() throws Exception { IndexSearcher indexSearcher = getIndexSearcher(); BooleanQuery booleanQuery = ... Query query2 = new TermQuery(new Term("fileName","lucene"));
它可以用于为各种应用程序构建搜索功能,比如电子邮件客户端、邮件列表、Web 搜索、数据库搜索等等。Wikipedia、TheServerSide、jGuru 和 LinkedIn 等网站都使用了 Lucene。 Lucene 还为 Eclipse IDE、Nutch(著名的...
第三章 搜索功能 8 3.1 简单搜索 8 (1) 创建IndexReader 8 (2) 创建IndexSearcher 8 (3) 创建Term和TermQuery 9 (5) 根据TopDocs获取ScoreDoc 9 (6) 根据ScoreDoc获取相应文档 9 3.2 其他搜索 9 (1) 范围查询...
全文检索lucene 4.3 所用到的3个jar包,包含lucene-queryparser-4.3.0.jar、 lucene-core-4.3.0.jar、lucene-analyzers-common-4.3.0.jar。
Lucene查询解析器 Lucene查询字符串解析器,用作Web api查询或过滤器字符串。 基本代码来自 ...composer require "smallhomelab/lucene-query-parser" 用法 $ parseTree = ( new LucenenQueryParser \
usage: LuceneQueryTool [options] --analyzer <arg> for query, (KeywordAnalyzer | StandardAnalyzer) (defaults to KeywordAnalyzer) --fields <arg> fields to include in output (defaults to all) -i,--...
import org.apache.lucene.search.Query; import org.wltea.analyzer.IKSegmentation; import org.wltea.analyzer.Lexeme; /** * Apache Lucene全文检索和IKAnalyzer分词工具类 * <p>Company: 91注册码 * time:...
NULL 博文链接:https://sunhao-java.iteye.com/blog/1874396
提供了索引搜索器IndexSearcher类和各种Query类,如TermQuery、BooleanQuery等。 6) queryParser模块:负责查询语句的语法分析。提供了解析查询语句的QueryParser类 7) util模块:包含一些公共工具类。 5. 创建...
1、 Lucene介绍 a) 什么是lucene b) 全文检索的应用场景 c) 全文检索定义 2、 Luence实现全文检索的流程(重点...a) 通过Query子类创建查询对象 b) 通过QueryParser创建查询对象 7、 相关度排序 8、 中文分词器(重点)
import org.apache.lucene.search.Query; /** * 张超 * ago52030@163.com * @author Administrator * */ public class WareSearch { public Hits search(String key, String city) { Hits hits = null; ...
lucene-fluent-query-builder 使用 Lucene 的 API 构建查询可能有点麻烦,至少是冗长的。 此类尝试使用 Fluent 接口模式使编写查询尽可能简单。安装PM > Install-Package LrNet.Lucene.Fluent 首先,在使用查询构建...