elasticsearch 基本使用,ES8.10

news/2024/7/7 6:41:20 标签: elasticsearch

官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/current/elasticsearch-intro.html

ES版本:8.10

By default, Elasticsearch indexes all data in every field and each indexed field has a dedicated, optimized data structure. For example, text fields are stored in inverted indices, and numeric and geo fields are stored in BKD trees.

Elasticsearch will detect and map booleans, floating point and integer values, dates, and strings to the appropriate Elasticsearch data types.

http://localhost:9200/

{
    name: "WINDOWS10-JACK",
    cluster_name: "elasticsearch",
    cluster_uuid: "JYZzG3wITwqXA2cPUv1otA",
    version: {
        number: "8.10.4",
        build_flavor: "default",
        build_type: "zip",
        build_hash: "b4a62ac808e886ff032700c391f45f1408b2538c",
        build_date: "2023-10-11T22:04:35.506990650Z",
        build_snapshot: false,
        lucene_version: "9.7.0",
        minimum_wire_compatibility_version: "7.17.0",
        minimum_index_compatibility_version: "7.0.0"
    },
    tagline: "You Know, for Search"
}
信息查看

http://localhost:9200/_cat

=^.^=
/_cat/allocation
/_cat/shards
/_cat/shards/{index}
/_cat/master
/_cat/nodes
/_cat/tasks
/_cat/indices
/_cat/indices/{index}
/_cat/segments
/_cat/segments/{index}
/_cat/count
/_cat/count/{index}
/_cat/recovery
/_cat/recovery/{index}
/_cat/health
/_cat/pending_tasks
/_cat/aliases
/_cat/aliases/{alias}
/_cat/thread_pool
/_cat/thread_pool/{thread_pools}
/_cat/plugins
/_cat/fielddata
/_cat/fielddata/{fields}
/_cat/nodeattrs
/_cat/repositories
/_cat/snapshots/{repository}
/_cat/templates
/_cat/component_templates/_cat/ml/anomaly_detectors
/_cat/ml/anomaly_detectors/{job_id}
/_cat/ml/datafeeds
/_cat/ml/datafeeds/{datafeed_id}
/_cat/ml/trained_models
/_cat/ml/trained_models/{model_id}
/_cat/ml/data_frame/analytics
/_cat/ml/data_frame/analytics/{id}
/_cat/transforms
/_cat/transforms/{transform_id}
健康检查

http://localhost:9200/_cat/health

1698826410 08:13:30 elasticsearch yellow 1 1 6 6 0 0 3 0 - 66.7%

带上表头 http://localhost:9200/_cat/health?v

epoch      timestamp cluster       status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1698826430 08:13:50  elasticsearch yellow          1         1      6   6    0    0        3             0                  -                 66.7%

正常情况下,Elasticsearch 集群健康状态分为三种:

  • green:最健康得状态,说明所有的分片包括备份都可用; 这种情况Elasticsearch集群所有的主分片和副本分片都已分配,
    Elasticsearch集群是 100% 可用的。
  • yellow :基本的分片可用,但是备份不可用(或者是没有备份);
    这种情况Elasticsearch集群所有的主分片已经分片了,但至少还有一个副本是缺失的。不会有数据丢失,所以搜索结果依然是完整的。不过,你的高可用性在某种程度上被弱化。如果
    更多的 分片消失,你就会丢数据了。把 yellow 想象成一个需要及时调查的警告。
  • red:部分的分片可用,表明分片有一部分损坏。此时执行查询部分数据仍然可以查到,遇到这种情况,还是赶快解决比较好;
    这种情况Elasticsearch集群至少一个主分片(以及它的全部副本)都在缺失中。这意味着你在缺少数据:搜索只能返回部分数据,而分配到这个分片上的写入请求会返回一个异常。

Elasticsearch 集群不健康时的排查思路

  • 首先确保 es 主节点最先启动,随后启动数据节点;
  • 允许 selinux(非必要),关闭 iptables
  • 确保数据节点的elasticsearch配置文件正确
  • 系统最大打开文件描述符数是否够用
  • elasticsearch设置的内存是否够用 ("ES_HEAP_SIZE"内存设置 和 "indices.fielddata.cache.size"上限设置);
  • elasticsearch的索引数量暴增 , 删除一部分索引(尤其是不需要的索引);
查看集群节点

http://localhost:9200/_cat/nodes?v

查看索引列表

http://localhost:9200/_cat/indices?v

health status index    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   book     Jc7jskvOSzCobjs4SPS7Jw   3   1          0            0       744b           744b
yellow open   magazine 8-zYU22QTda7CjSEdphJzA   3   1          0            0       678b           678b
创建索引

索引的名称必须是小写的不可重名

创建一个 articles 索引,包含3个分片,1个副本,pretty表示返回json信息。

PUT http://localhost:9200/articles?pretty
{
    "settings" : {
        "index" : {
            "number_of_shards" : 3,
            "number_of_replicas" : 1
        }
    },
    "mappings" : {
        "type1" : {
            "properties" : {
                "field1" : { "type" : "text" }
            }
        }
    }
}

返回
{
    "acknowledged": true,
    "shards_acknowledged": true,
    "index": "articles"
}
查看索引定义信息

可以一次获取多个索引(以逗号间隔) 获取所有索引 _all 或 用通配符*

GET http://localhost:9200/articles

{
    "articles": {
        "aliases": {},
        "mappings": {
            "properties": {
                "age": {
                    "type": "long"
                },
                "gender": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "name": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                }
            }
        },
        "settings": {
            "index": {
                "routing": {
                    "allocation": {
                        "include": {
                            "_tier_preference": "data_content"
                        }
                    }
                },
                "number_of_shards": "1",
                "provided_name": "articles",
                "creation_date": "1698827033458",
                "number_of_replicas": "1",
                "uuid": "OOVlnKxYT9SiWdlt-ZOPMw",
                "version": {
                    "created": "8100499"
                }
            }
        }
    }
}
GET http://localhost:9200/articles/_settings
GET http://localhost:9200/articles/_mappings
修改索引的settings信息

索引的设置信息分为静态信息和动态信息两部分。静态信息不可更改,如索引的分片数。动态信息可以修改。具体参考 https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#index-modules-settings

PUT http://localhost:9200/articles/_settings
{
    "index" : {
        "number_of_replicas" : 2
    }
}

设置索引的读写

index.blocks.read_only:设为true,则索引以及索引的元数据只可读
index.blocks.read_only_allow_delete:设为true,只读时允许删除。
index.blocks.read:设为true,则不可读。
index.blocks.write:设为true,则不可写。
index.blocks.metadata:设为true,则索引元数据不可读写。
删除索引

可以一次删除多个索引(以逗号间隔) 删除所有索引 _all 或 通配符 *

DELETE http://localhost:9200/article
判断索引是否存在
HEAD http://localhost:9200/article

通过返回的HTTP状态码判断,200存在, 404不存在
写入数据
向 articles 索引写入一个文档,并指定id为1
curl -X PUT "http://localhost:9200/articles/_doc/1?pretty" -H 'Content-Type: application/json' -d'{"name": "John Doe"}'

返回
{
    "_index": "articles",
    "_id": "1",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 0,
    "_primary_term": 1
}
查询文档
获取指定id的文档
GET http://localhost:9200/articles/_doc/1?pretty

返回
{
    "_index": "articles",
    "_id": "1",
    "_version": 1,
    "_seq_no": 0,
    "_primary_term": 1,
    "found": true,
    "_source": {
        "name": "John Doe"
    }
}

_source为原始数据

获取全部文档
GET http://localhost:9200/articles/_search

返回
{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "articles",
                "_id": "1",
                "_score": 1.0,
                "_source": {
                    "name": "John Doe"
                }
            },
            {
                "_index": "articles",
                "_id": "2",
                "_score": 1.0,
                "_source": {
                    "name": "Rao Xiao Ya",
                    "age": 30,
                    "gender": "male"
                }
            }
        ]
    }
}
GET http://localhost:9200/articles/_search
{
  "query": { "match_all": {} },
  "sort": [
    {"name": "asc" }
  ]
}

ES中的数据类型 data type

https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html

1、keyword

  • keyword, which is used for structured content such as IDs, email addresses, hostnames, status codes, zip codes, or tags.
  • constant_keyword for keyword fields that always contain the same value.
  • wildcard for unstructured machine-generated content. The wildcard type is optimized for fields with large values or high cardinality.

Keyword fields are often used in sorting, aggregations, and term-level queries, such as term.

keyword类型的字段不能被分词,因此不能提供 full-text search。

2、text

  • text, the traditional field type for full-text content such as the body of an email or the description of a product.
  • match_only_text, a space-optimized variant of text that disables scoring and performs slower on queries that need positions. It is best suited for indexing log messages.

text 类型会被分词,然后根据分词来建立索引结构。

text 类型主要用于full-text search,不适合做 sorting 和 aggregations。

text 类型可以设置分词器analyzer,而 keyword 则不需要。

**NOTE:**可以使一个字段同时拥有textkeyword两种类型,称为 multi fields,这样,此字段既可以做 full-text search,也可以做 sorting 和 aggregations,比如对于书籍的书名,文章标题等不是很长的文本,因为有了keyword,那么在检索的时候会更精确,因为它不考虑分词,另外,它还会返回包含分词的结果,也就是text的效果。比如要检索《刘心武妙品红楼梦》这本书,如果我输入书名的全称,在返回的结果集中,那应该第一个就是准确的结果,后面的才是分词之后的结果。

"name":{
    "type":"text",
    "fields":{
    	"keyword":{"type":"keyword", "ignore_above":256}
    }
}

3、numeric

longA signed 64-bit integer with a minimum value of -263 and a maximum value of 263-1.
integerA signed 32-bit integer with a minimum value of -231 and a maximum value of 231-1.
shortA signed 16-bit integer with a minimum value of -32,768 and a maximum value of 32,767.
byteA signed 8-bit integer with a minimum value of -128 and a maximum value of 127.
doubleA double-precision 64-bit IEEE 754 floating point number, restricted to finite values.
floatA single-precision 32-bit IEEE 754 floating point number, restricted to finite values.
half_floatA half-precision 16-bit IEEE 754 floating point number, restricted to finite values.
scaled_floatA floating point number that is backed by a long, scaled by a fixed double scaling factor.
unsigned_longAn unsigned 64-bit integer with a minimum value of 0 and a maximum value of 264-1.

关于 scaled_float:就是将一个浮点数在底层使用 integer 来存储,因为 integer 要比 float 更容易压缩,因此可以节省磁盘空间,具体实现就是设置一个浮点数因子 scaling_factor,这样你输入一个浮点数 f1,在底层存储的为 scaling_factor * f1的整型

{
  "mappings": {
    "properties": {
      "number_of_bytes": {
        "type": "integer"
      },
      "time_in_seconds": {
        "type": "float"
      },
      "price": {
        "type": "scaled_float",
        "scaling_factor": 100
      }
    }
  }
}

4、date

JSON doesn’t have a date data type, so dates in Elasticsearch can either be:

  • strings containing formatted dates, e.g. "2015-01-01" or "2015/01/01 12:10:30".
  • a number representing milliseconds-since-the-epoch.
  • a number representing seconds-since-the-epoch (configuration).

Internally, dates are converted to UTC (if the time-zone is specified) and stored as a long number representing milliseconds-since-the-epoch (内部存储的是毫秒时间戳).

查询的时候,查询的时间条件也会被转换成毫秒时间戳,然后返回结果的时候又会转换成字符串。

{
  "mappings": {
    "properties": {
      "update_date": {
        "type":   "date",
        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      }
    }
  }
}

The first format will be used to convert the milliseconds-since-the-epoch value back into a string. The default format value is strict_date_optional_time||epoch_millis.代表传入的值应为秒级时间戳。

关于mapping映射

在ES中mapping映射类似于在数据库中定义表结构,即表里面有哪些字段、字段是什么类型、字段的默认值等。

在创建索引的同时可以指定mapping,索引创建后也可以修改mapping。

如果索引没有mapping,但是就往里写入数据,ES会自动根据数据来创建mapping,称为Dynamic mapping,但是实际业务中,对于关键字段类型,我们都是通常预先定义好,这样可以避免ES自动生成的字段类型不是你想要的类型。

官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html

NOTE:Before 7.0.0, the mapping definition included a type name. Elasticsearch 7.0.0 and later no longer accept a default mapping. See Removal of mapping types. 也就是说 mappings 和 properties 之间不会再有 type 这个层级。

1、Dynamic mapping的规则

JSON data type"dynamic":"true""dynamic":"runtime"
nullNo field addedNo field added
true or falsebooleanboolean
doublefloatdouble
longlonglong
objectobjectNo field added
arrayDepends on the first non-null value in the arrayDepends on the first non-null value in the array
string that passes date detectiondatedate
string that passes numeric detectionfloat or longdouble or long
string that doesn’t pass date detection or numeric detectiontext with a .keyword sub-fieldkeyword

2、显示的mapping声明

{
	"mappings":{
		"properties":{
			"email":{
				"type": "keyword"
			},
			"name":{
				"type":"text",
				"fields":{
					"keyword":{"type":"keyword", "ignore_above":256}
				}
			},
			"photo":{
				"type":"text",
				"index": false
			}
		}
	}
}

ignore_above:Do not index any string longer than this value. Defaults to 2147483647 so that all values would be accepted. Please however note that default dynamic mapping rules create a sub keyword field that overrides this default by setting ignore_above: 256. 翻译一下:keyword类型顾名思义就是内容不要太长,如果太长了,就会按着设置的这个值截取掉,来创建索引,超出的内容不参与索引,但是并不是说超出的内容被删掉了。

默认,ES会为每个字段都构建索引结构,当然可以设置"index": false,这样就不会构建索引,且这个字段不能作为搜索条件来搜索。

3、update the mapping of a field

对于一个已经存在的字段,只能修改mapping中指定的一些参数,比如字段的type参数就不能修改。

如果字段的属性被修改了,有时候还需要重新为你的数据构建索引。

4、view the mapping of an index

GET http://localhost:9200/articles/_mappings
关于文本分词 Text Analyze

The index analysis module acts as a configurable registry of analyzers that can be used in order to convert a string field into individual terms which are:

  • added to the inverted index in order to make the document searchable
  • used by high level queries such as the match query to generate search terms.

文档:https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis.html

分词器:language analyzer

写入文档搜索文档的时候会使用到分词器,并且尽量这两个操作使用同一个分词器,这样能确保分词的结果一样。分词器只对 text field 生效。设置分词器是必须的,这会使文本搜索更精确和高效。

分词的作用是提供 full-text 检索,即文档中包含要检索的词就算匹配到。

Elasticsearch includes a default analyzer, called the standard analyzer, which works well for most use cases right out of the box.

If you want to tailor your search experience, you can choose a different built-in analyzer or even configure a custom one. A custom analyzer gives you control over each step of the analysis process, including:

  • Changes to the text before tokenization
  • How text is converted to tokens
  • Normalization changes made to tokens before indexing or search

一个分词器包含三个底层的模块:character filters, tokenizers, and token filters

  • Character filters

    A character filter receives the original text as a stream of characters and can transform the stream by adding, removing, or changing characters. For instance, a character filter could be used to convert Hindu-Arabic numerals (٠‎١٢٣٤٥٦٧٨‎٩‎) into their Arabic-Latin equivalents (0123456789), or to strip HTML elements like <b> from the stream.

    An analyzer may have zero or more character filters, which are applied in order.

  • Tokenizer

    A tokenizer receives a stream of characters, breaks it up into individual tokens (usually individual words), and outputs a stream of tokens. For instance, a whitespace tokenizer breaks text into tokens whenever it sees any whitespace. It would convert the text "Quick brown fox!" into the terms [Quick, brown, fox!].

    The tokenizer is also responsible for recording the order or position of each term and the start and end character offsets of the original word which the term represents.

    An analyzer must have exactly one tokenizer.

  • Token filters

    A token filter receives the token stream and may add, remove, or change tokens. For example, a lowercase token filter converts all tokens to lowercase, a stop token filter removes common words (stop words) like the from the token stream, and a synonym token filter introduces synonyms into the token stream.

    Token filters are not allowed to change the position or character offsets of each token.

    An analyzer may have zero or more token filters, which are applied in order.

内置的分词器
Fingerprint
Keyword
Language
Pattern
Simple
Standard
Stop
Whitespace
测试分词器
curl -X POST "localhost:9200/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
  "analyzer": "whitespace",
  "text":     "The quick brown fox."
}
'
{
    "tokens": [
        {
            "token": "The",
            "start_offset": 0,
            "end_offset": 3,
            "type": "word",
            "position": 0
        },
        {
            "token": "quick",
            "start_offset": 4,
            "end_offset": 9,
            "type": "word",
            "position": 1
        },
        {
            "token": "brown",
            "start_offset": 10,
            "end_offset": 15,
            "type": "word",
            "position": 2
        },
        {
            "token": "fox.",
            "start_offset": 16,
            "end_offset": 20,
            "type": "word",
            "position": 3
        }
    ]
}
定制分词器

第一种是使用ES内置的 Tokenizer 和 filters 来组合成一个分词器,然后在索引的 settings 中设置。

  • zero or more character filters
  • a tokenizer
  • zero or more token filters.
{
  "settings": {
    "analysis": {
      "analyzer": {
        "std_folded": { 
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "asciifolding"
          ],
          "char_filter": [
            "html_strip"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "my_text": {
        "type": "text",
        "analyzer": "std_folded" 
      }
    }
  }
}

第二中是引入第三方扩展包放在ES中,比如 elasticsearch-analysis-ik ,这个后面再说。

指定分词器

1、在索引的 settings 中指定全局的分词器。

还可以同时设置 search_analyzer

{
  "settings": {
    "analysis": {
      "analyzer": {
        "default": {
          "type": "simple"
        }
      }
    }
  }
}
或者
{
  "settings": {
    "analysis": {
      "analyzer": {
        "default": {
          "type": "simple"
        },
        "default_search": {
          "type": "whitespace"
        }
      }
    }
  }
}

2、在索引的 mappings 中可以为每个text字段设置不同的分词器,如果没有指定,那就按照默认的分词器Standard

还可以同时设置 search_analyzer

{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "whitespace"
      }
    }
  }
}
或者
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "whitespace",
        "search_analyzer": "simple"
      }
    }
  }
}

3、在query语句中设置分词器,如果没有指定,那就按照默认的分词器Standard

{
  "query": {
    "match": {
      "message": {
        "query": "Quick foxes",
        "analyzer": "stop"
      }
    }
  }
}

在查询的时候,ES会根据以下顺序来确定使用哪个分词器。

  1. The analyzer parameter in the search query. See Specify the search analyzer for a query.
  2. The search_analyzer mapping parameter for the field. See Specify the search analyzer for a field.
  3. The analysis.analyzer.default_search index setting. See Specify the default search analyzer for an index.
  4. The analyzer mapping parameter for the field. See Specify the analyzer for a field.

If none of these parameters are specified, the standard analyzer is used.

查询操作

Query DSL
Query DSL supports a variety of query types you can mix and match to get the results you want. Query types include:

  • Boolean and other compound queries, which let you combine queries and match results based on multiple criteria
  • Term-level queries for filtering and finding exact matches
  • Full text queries, which are commonly used in search engines
  • Geo and spatial queries

Aggregations
You can use search aggregations to get statistics and other analytics for your search results. Aggregations help you answer questions like:

  • What’s the average response time for my servers?
  • What are the top IP addresses hit by users on my network?
  • What is the total transaction revenue by customer?

Search multiple data streams and indices
You can use comma-separated values and grep-like index patterns to search several data streams and indices in the same request. You can even boost search results from specific indices. See Search multiple data streams and indices.

Paginate search results
By default, searches return only the top 10 matching hits. To retrieve more or fewer documents, see Paginate search results.

Retrieve selected fields
The search response’s hits.hits property includes the full document _source for each hit. To retrieve only a subset of the _source or other fields, see Retrieve selected fields.

Sort search results
By default, search hits are sorted by _score, a relevance score that measures how well each document matches the query. To customize the calculation of these scores, use the script_score query. To sort search hits by other field values, see Sort search results.

Run an async search
Elasticsearch searches are designed to run on large volumes of data quickly, often returning results in milliseconds. For this reason, searches are synchronous by default. The search request waits for complete results before returning a response.

However, complete results can take longer for searches across large data sets or multiple clusters.

To avoid long waits, you can run an asynchronous, or async, search instead. An async search lets you retrieve partial results for a long-running search now and get complete results later.

Search timeout

GET /my-index-000001/_search
{
  "timeout": "2s",
  "query": {
    "match": {
      "user.id": "kimchy"
    }
  }
}

Search cancellation

You can cancel a search request using the task management API. Elasticsearch also automatically cancels a search request when your client’s HTTP connection closes. We recommend you set up your client to close HTTP connections when a search request is aborted or times out.

Track total hits 返回全部数据

默认情况下之后返回 10 条记录,这就需要多次查询。

Generally the total hit count can’t be computed accurately without visiting all matches, which is costly for queries that match lots of documents. The track_total_hits parameter allows you to control how the total number of hits should be tracked. Given that it is often enough to have a lower bound of the number of hits, such as “there are at least 10000 hits”, the default is set to 10,000. This means that requests will count the total hit accurately up to 10,000 hits. It is a good trade off to speed up searches if you don’t need the accurate number of hits after a certain threshold.

通常情况下,总的命中数是不能被精确的计算出的,除非拉取到所有的匹配到的数据,这显然会浪费时间,参数track_total_hits允许你来控制该如何设定命中数,一般设置一个下限就足够了,比如“这里至少有10000个文档会被命中”,这将提升查询效率。

GET my-index-000001/_search
{
  "track_total_hits": true,
  "query": {
    "match" : {
      "user.id" : "elkbee"
    }
  }
}

When set to true the search response will always track the number of hits that match the query accurately (e.g. total.relation will always be equal to "eq" when track_total_hits is set to true). Otherwise the "total.relation" returned in the "total" object in the search response determines how the "total.value" should be interpreted. A value of "gte" means that the "total.value" is a lower bound of the total hits that match the query and a value of "eq" indicates that "total.value" is the accurate count.

track_total_hits设置为true时,将会精确的统计命中数(即,total.relation的值永远是eq),否则,就需要通过total.valuetotal.relation的值来解释命中数,gte表示大于等于,eq表示等于。

{
  "_shards": ...
  "timed_out": false,
  "took": 100,
  "hits": {
    "max_score": 1.0,
    "total" : {
      "value": 2048,    
      "relation": "eq"  
    },
    "hits": ...
  }
}

当然track_total_hits也可以设置为一个整数,意味着最多返回的文档数。

GET my-index-000001/_search
{
  "track_total_hits": 100,
  "query": {
    "match": {
      "user.id": "elkbee"
    }
  }
}
{
  "_shards": ...
  "timed_out": false,
  "took": 30,
  "hits": {
    "max_score": 1.0,
    "total": {
      "value": 42,         
      "relation": "eq"     
    },
    "hits": ...
  }
}

如果库中的总文档数大于等于 100,那就是gte

{
  "_shards": ...
  "hits": {
    "max_score": 1.0,
    "total": {
      "value": 100,         
      "relation": "gte"     
    },
    "hits": ...
  }
}

Quickly check for matching docs 检查是否存在,无需返回列表

If you only want to know if there are any documents matching a specific query, you can set the size to 0 to indicate that we are not interested in the search results. You can also set terminate_after to 1 to indicate that the query execution can be terminated whenever the first matching document was found (per shard).

GET /_search?q=user.id:elkbee&size=0&terminate_after=1

terminate_after is always applied after the post_filter and stops the query as well as the aggregation executions when enough hits have been collected on the shard. Though the doc count on aggregations may not reflect the hits.total in the response since aggregations are applied before the post filtering.

The response will not contain any hits as the size was set to 0. The hits.total will be either equal to 0, indicating that there were no matching documents, or greater than 0 meaning that there were at least as many documents matching the query when it was early terminated. Also if the query was terminated early, the terminated_early flag will be set to true in the response. Some queries are able to retrieve the hits count directly from the index statistics, which is much faster as it does not require executing the query. In those situations, no documents are collected, the returned total.hits will be higher than terminate_after, and terminated_early will be set to false.

{
  "took": 3,
  "timed_out": false,
  "terminated_early": true,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped" : 0,
    "failed": 0
  },
  "hits": {
    "total" : {
        "value": 1,
        "relation": "eq"
    },
    "max_score": null,
    "hits": []
  }
}

The took time in the response contains the milliseconds that this request took for processing, beginning quickly after the node received the query, up until all search related work is done and before the above JSON is returned to the client. This means it includes the time spent waiting in thread pools, executing a distributed search across the whole cluster and gathering all the results.

Highlighting

Highlighters enable you to get highlighted snippets from one or more fields in your search results so you can show users where the query matches are. When you request highlights, the response contains an additional highlight element for each search hit that includes the highlighted fields and the highlighted fragments.

Elasticsearch supports three highlighters: unified, plain, and fvh (fast vector highlighter). You can specify the highlighter type you want to use for each field.

GET /_search
{
  "query": {
    "match": { "content": "kimchy" }
  },
  "highlight": {
    "fields": {
      "content": {}
    }
  }
}

Query 和 Filter 的区别

query 从句解决的是“How well does this document match this query clause?”,因此会计算匹配的分数_score,然后让用户自己决策,query从句的使用参见 Search APIs

filter 从句解决的是“Does this document match this query clause?”,答案是 Yes or No,不会计算分数,是一种精确的条件。比如要过滤出statuson的记录。

比如下面的例子

GET /_search
{
  "query": { 
    "bool": { 
      "must": [
        { "match": { "title":   "Search"        }},
        { "match": { "content": "Elasticsearch" }}
      ],
      "filter": [ 
        { "term":  { "status": "published" }},
        { "range": { "publish_date": { "gte": "2015-01-01" }}}
      ]
    }
  }
}

This query will match documents where all of the following conditions are met:

  • The title field contains the word search.
  • The content field contains the word elasticsearch.
  • The status field contains the exact word published.
  • The publish_date field contains a date from 1 Jan 2015 onwards.

基本查询结构

GET /{索引名}/_search
{
    "from" : 0,  // 搜索结果的开始位置
    "size" : 10, // 分页大小,也就是一次返回多少数据
    "_source" : [ ... ], // 需要返回的字段数组
    "query" : { ... }, // query子句
    "aggs" : { ... }, // aggs子句
    "sort" : { ... } // sort子句
}
GET /my-index-000001/_search
{
  "from": 5,
  "size": 20,
  "_source" : ["nickname", "photo"],
  "query": {
    "match": {
      "user.id": "kimchy"
    }
  },
  "sort": [
        {"date": "asc"},
        {"tie_breaker_id": "asc"},
        "_score"
   ],
   "aggs": {}
}

范围查询

GET /{索引名}/_search
{
  "query": {
    "range": {
      "{FIELD}": {
        "gte": 100, 
        "lte": 200
      }
    }
  }
}
{FIELD} - 字段名
gte范围参数 - 等价于>=
lte范围参数 - 等价于 <=
范围参数可以只写一个,例如:仅保留 “gte”: 100, 则代表 FIELD字段 >= 100
gt - 大于 ( > )
gte - 大于且等于 ( >= )
lt - 小于 ( < )
lte - 小于且等于 ( <=

Bool组合查询

GET /{索引名}/_search
{
  "query": {
    "bool": { // bool查询
      "must": [], // must条件,类似SQL中的and, 代表必须匹配条件
      "must_not": [], // must_not条件,跟must相反,必须不匹配条件
      "should": [] // should条件,类似SQL中or, 代表匹配其中一个条件
      "filter": [] // filter子句
    }
  }
}

字段 bool的下级包括:must, must_not, filter, should

term是代表完全匹配,即不进行分词器分析,文档中必须包含整个搜索的词汇。

matchterm的区别是:match查询的时候,elasticsearch会根据你给定的字段提供合适的分析器,而term查询不会有分析器分析的过程,match查询相当于模糊匹配,只包含其中一部分关键词就行。

短语搜索phrase searches
相似搜索similarity searches
前缀搜索prefix searches

elasticsearch支持JSON-style-querySQL-style-query

JSON: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html

SQL: https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-overview.html

分析数据 Analyzing your data

聚合查询

elasticsearchgo_1087">elasticsearch-go

github.com/elastic/go-elasticsearch/v8

https://www.elastic.co/guide/en/elasticsearch/client/go-api/current/getting-started-go.html


http://www.niftyadmin.cn/n/5161991.html

相关文章

自动化之Java面试

1.重写与重载的区别 重载规则&#xff1a; 方法名相同&#xff0c;参数个数或类型不同&#xff0c;与返回值类型无关&#xff0c;节约词汇,例如driver.switchTo().frame&#xff08;index/nameOrId/frameElement&#xff09; java的重载(overload) 最重要的应用场景就是构造器…

【PG】PostgreSQL13主从流复制部署(详细可用)

目录 版本 部署主从注意点 1 主库上创建复制用户 2 主库上修改pg_hba.conf文件 3 修改文件后重新加载配置使其生效 4 主库上修改配置文件 5 重启主库pg使参数生效 6 部署从库 7 备份主库数据至从库 停止从库 备份从库的数据库目录 新建数据库数据目录data 创建和…

c++ 信奥赛编程 2049:【例5.19】字符串判等

#include <iostream> using namespace std; string strlwr(string s) { for(int i0;i<s.size();i){if(s[i]>A && s[i]<Z)s[i]s[i]-Aa;}return s; } int main() {string str1,str2; //定义两个字符串变量 getline(cin,str1); //通过函数输入字符串 getl…

​轻量应用服务器是什么?和云服务器的区别有哪些

在当今快速发展的应用开发和网站建设领域&#xff0c;选择合适的服务器配置是一个相当重要的事。作为当前最优质的海外服务器服务商&#xff0c;现在也受到了越来越多用户的欢迎&#xff0c;而在近几年的服务器领域&#xff0c;轻量应用服务器是一个备受关注的服务器类型&#…

Microsoft Edge浏览器不兼容解决办法

找到 Edge 的安装位置&#xff0c;一般在 C:\Program Files (x86)Microsoft Edge\Application\ 这个目录&#xff0c;把 edge.exe 或msedge.exe 修改为 chrome.exe 再重启电脑。

python自动化测试(3)- 自动化框架及工具

1、概述 手续的关于测试的方法论&#xff0c;都是建立在之前的文章里面提到的观点&#xff1a; 功能测试不建议做自动化接口测试性价比最高接口测试可以做自动化 后面所谈到的 测试自动化 也将围绕着 接口自动化 来介绍。 本系列选择的测试语言是 python 脚本语言。由于其官…

【通信原理】第三章 随机过程——例题

一、随机过程 1. 数学特征 ① 随机信号&#xff08;三角函数表达式&#xff09; ② 随机信号&#xff08;求和表达式&#xff09; 2. 功率谱密度 ① 相位确定&#xff0c;求功率谱密度 ② 已知相位分布&#xff0c;求功率谱密度 ③ 信号为两信号之和&#xff0c;求功率谱密度…

AtCoder abc148

C题 求GCD D题 顺序遍历 E题 trailing zero只与5的个数有关&#xff0c;因此算一下5/25/125…的倍数 # -*- coding: utf-8 -*- # time : 2023/6/2 13:30 # file : atcoder.py # software : PyCharmimport bisect import copy import sys from itertools import perm…