`
m635674608
  • 浏览: 4923421 次
  • 性别: Icon_minigender_1
  • 来自: 南京
社区版块
存档分类
最新评论

分布式搜索引擎ElasticSearch(四) -- 插件使用

 
阅读更多
    • 首先 非常感谢国内大神 - Medcl(博客github),非常感谢对elasticsearch的贡献;

    • 由天性懒惰,则直接用Medcl的即飞的航空模(elasticsearch-rtf),这是elasticsearch中文发行版,针对中文集成相关插件,方便我这些新手菜鸟学习,或者在生产环境中直接使用

    • servicewrapper  (by elasticsearch team)

服务器环境下之管理

安装与使用,请参考elasticsearch-servicewrapper

  • analysis-smartcn (by elasticsearch team)

    lucene默认的中文分词器

    安装与使用,请参考elasticsearch-analysis-smartcn

  • transport-thrift (by elasticsearch team)

    使用thrift进行数据传输,速度快

    安装与使用,请参考elasticsearch-transport-thrift

  • mapper-attachments

  • analysis-ik (by Medcl)

    中国鼎鼎大名的IK分词器,也是我最喜欢的分词器,推荐

    安装与使用,请参考elasticsearch-analysis-ik

    java API 在下面贴上:

    // url test:http://192.168.1.108:9200/twitter/_analyze?analyzer=ik&text=%E6%B5%8B%E8%AF%95elasticsearch%E5%88%86%E8%AF%8D%E5%99%A8%E7%9A%84%E6%95%88%E6%9E%9C&pretty=true//1.create a index
    client.admin().indices().prepareCreate("index_ik").execute().actionGet();//2.create a mappingXContentBuilder mapping =XContentFactory.jsonBuilder().startObject().startObject("fulltext").startObject("_all").field("indexAnalyzer","ik").field("searchAnalyzer","ik").field("term_vector","no").field("store","false").endObject().startObject("properties").startObject("content").field("type","string").field("store","no").field("term_vector","with_positions_offsets").field("indexAnalyzer","ik").field("searchAnalyzer","ik").field("include_in_all","true").field("boost",8).endObject().endObject().endObject().endObject();PutMappingRequest mappingRequest =Requests.putMappingRequest("index_ik").type("fulltext").source(mapping);  
            client.admin().indices().putMapping(mappingRequest).actionGet();//3.index some docsXContentBuilder builder1 =XContentFactory.jsonBuilder().startObject().field("content","美国留给伊拉克的是个烂摊子吗").endObject();XContentBuilder builder2 =XContentFactory.jsonBuilder().startObject().field("content","公安部:各地校车将享最高路权").endObject();XContentBuilder builder3 =XContentFactory.jsonBuilder().startObject().field("content","中韩渔警冲突调查:韩警平均每天扣1艘中国渔船").endObject();XContentBuilder builder4 =XContentFactory.jsonBuilder().startObject().field("content","中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首").endObject();XContentBuilder builder5 =XContentFactory.jsonBuilder().startObject().field("content","我爱我的中国 我的中国也爱我").endObject();BulkRequestBuilder bulkRequest = client.prepareBulk();
             bulkRequest.add(client.prepareIndex("index_ik","fulltext","1").setSource(builder1)).add(client.prepareIndex("index_ik","fulltext","2").setSource(builder2)).add(client.prepareIndex("index_ik","fulltext","3").setSource(builder3)).add(client.prepareIndex("index_ik","fulltext","4").setSource(builder4)).add(client.prepareIndex("index_ik","fulltext","5").setSource(builder5));BulkResponse bulkResponse =  bulkRequest.execute().actionGet();if(bulkResponse.hasFailures()){}
            logger.info("bulk state : {}",bulkResponse.hashCode());//4.query with highlightingSearchResponse searchResponse = client.prepareSearch("index_ik").setTypes("fulltext").setQuery(QueryBuilders.queryString("中国")).addHighlightedField("content")//.setHighlighterPreTags("<tag1>", "<tag2>")//.setHighlighterPostTags("</tag1>", "</tag2>").setFrom(0).setSize(10).setExplain(true)//Page  .execute().actionGet();SearchHits hits = searchResponse.getHits();long total = hits.getTotalHits();
            logger.info("search result total:{}",total);for(SearchHit hit : hits){Map<String,HighlightField> result = hit.highlightFields(); 
                logger.info("A map of highlighted fields:{}",result);HighlightField titleField = result.get("content");Text[] titleTexts =  titleField.fragments();for(Text text : titleTexts){  
                    logger.info("title text: :{}",text);}}
  • analysis-mmseg (by Medcl)

    安装与使用 请参考 elasticsearch-analysis-mmseg

    java API 请参考上面的analysis-ik,因为类似,不必多贴!

  • analysis-pinyin  (by Medcl)

    拼音分词器,可为了SEO(mongodb的id不友好)而做友好的拼音url,也可用于拼音搜索中文,也可用于输入拼音提示中文的功能,推荐

    安装与使用 请参考elasticsearch-analysis-pinyin

    //test url : http://192.168.1.108:9200/medcl/_analyze?text=%E8%B0%AD%E5%BB%BA%E7%BA%B3&analyzer=pinyin_analyzerAnalyzeResponse analyzeResponse = client.admin().indices().prepareAnalyze("twitter","阳光天使").setAnalyzer("pinyin").execute().actionGet();
            logger.info("size:{}", analyzeResponse.getTokens().size());List<AnalyzeToken> list = analyzeResponse.getTokens();for(AnalyzeToken token : list){
                logger.info("Term:{}", token.getTerm());}
  • analysis-stconvert(by Medcl)

    简繁体中文分词器, 中国文化博大精深,中文简体和中文繁体互换分词器

    安装与使用,请参考 elasticsearch-analysis-stconvert

  • analysis-string2int  (by Medcl)

    字符串转整型工具。主要用在facet这个功能上

    安装与使用,请参考 elasticsearch-analysis-string2int

  • tools.carrot2

    安装与使用,请参考elasticsearch-carrot2

  • segmentspy

    安装与使用,请参考elasticsearch-segmentspy

    import url : http://192.168.1.108:9200/_plugin/segmentspy/#/
  • elasticsearch-hq

    安装与使用,请参考elasticsearch-HQ

    import url : http://192.168.1.108:9200/_plugin/elasticsearch-hq/
  • ...

  • http://ju.outofmemory.cn/entry/83746
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics