elasticsearch中template引见

m635674608

浏览: 4928581 次
性别:
来自: 南京

最近访客更多访客>>

millerchu

xdung

yunnick

lijun4010

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

搜索引擎，爬虫
java

lasticsearch中template介绍

template中规定了数据的存储格式、分片数量等信息，下边是一个elasticsearch中template例子：

http://192.168.1.215:9200/_template/content_tpl
PUT

{
  "template": "content_*",
  "settings": {
    "number_of_shards": 1,
    "index.mapper.dynamic": false
  },
  "mappings": {
    "doc": {
      "properties": {
        "id": {
          "type": "long",
          "store": "yes"
        },
        "url": {
          "type": "string",
          "index": "not_analyzed",
          "omit_norms": "true",
          "store": "yes"
        },
        "signature": {
          "type": "string",
          "index": "not_analyzed",
          "omit_norms": "true",
          "store": "yes"
        },
        "keyword": {
          "type": "string",
          "index_analyzer": "lezhi_keyword",
          "index_options": "positions",
          "omit_norms": "true",
          "store": "no"
        },
        "count": {
          "type": "int",
          "index": "no",
          "store": "yes"
        },
        "lastModified": {
          "type": "long",
          "store": "yes"
        }
      }
    }
  }
}

template大致分成setting和mappings两部分：

1. settings主要作用于index的一些相关配置信息，如分片数、副本数，tranlog同步条件、refresh等。

2. mappings主要是一些说明信息，大致又分为_all、_source、prpperties这三部分：

(1) _all：主要指的是AllField字段，我们可以将一个或多个都包含进来，在进行检索时无需指定字段的情况下检索多个字段。设置“_all" : {"enabled" : true}

(2) _source：主要指的是SourceField字段，Source可以理解为ES除了将数据保存在索引文件中，另外还有一份源数据。_source字段在我们进行检索时相当重要，如果在{"enabled" : false}情况下默认检索只会返回ID，你需要通过Fields字段去到索引中去取数据，效率不是很高。但是enabled设置为true时，索引会比较大，这时可以通过Compress进行压缩和inclueds、excludes来在字段级别上进行一些限制，自定义哪些字段允许存储。

(3) properties：这是最重要的步伐，主要针对索引结构和字段级别上的一些设置。

一些详细解释可以参考 http://www.elasticsearch.org/guide/reference/mapping/

下面详细介绍properties中的一些配置

1. index_options

Add index_options (applicable to string type) with values of:

(1) docs: only documents are indexed, term frequencies and positions are omitted.

(2) freqs: documents and term frequencies are index, positions are omitted.
(3) positions: documents, term frequencies, and positions are indexed.
see: http://https://github.com/elasticsearch/elasticsearch/issues/2346

positions选项可以让lucene跳过对该项的出现频率和出现位置的索引，还可以节省一些索引在磁盘上的存储空间，还可以加速搜索和过滤过程，但是会但是悄悄阻止需要位置信息的搜索，如阻止PhraseQuery和SpanQuery类的运行

2. index

(1) analyzed -- 使用分词器将域值分解成独立的语汇单元流，并使每个语汇单元能被搜到，适用于普通文本域（如正文、标题、摘要等），通常需要设置“index_analyzer"。
(2) not_analyzed -- 对域进行索引，但不对String值进行分析，实际上将域值作为单一语汇单元并使之能本搜索，适用于不能被分解的域值，如URL、文件路径、日期、电话等。
(3) no -- 使用对应的域值不被搜索
3. omit_norms

norms记录了索引中index-time boost信息，但是当你进行搜索时可能会比较耗费内存。omit_norms = true则是忽略掉域加权信息，这样在搜索的时候就不会处理索引时刻的加权信息了。

4. store

域存储选项store，用来确定是否需要存储域的真实值，以便后续搜集时能恢复这个值。

(1) yes -- 指定存储域值。该情况下，原始的字符串全部被保存在索引中，并可以由IndexReader类恢复。该选项对于需要展示搜索结果的一些域很有用（如URL、标题等）。如果索引的大小在搜索程序考虑之列的话，不要存储太大的域值，因为这些域值会消耗掉索引的存储空间。
(2) no -- 指定不存储域值。该选项通常跟Index.ANALYZED选项共同用来索引大的文本域值，这些域值不用恢复初始格式，如文本正文。

http://www.myexception.cn/open-source/2032568.html

分享到：

热更新 IK 分词使用方法 | Elasticsearch——Templates 模板

2015-12-03 23:35
浏览 1216
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论