elasticsearch

2019-03-27 约 2411 字预计阅读 5 分钟

安装启动

elasticsearch需要依赖jdk elastic默认是9200端口 ctrl+c elastic就会停止默认情况下，Elastic 只允许本机访问，如果需要远程访问，可以修改 Elastic 安装目录的config/elasticsearch.yml文件，去掉network.host的注释，将它的值改成0.0.0.0，然后重新启动 Elastic。

基本概念

elastic可以看作是分布式数据库，每台elastic实例就是一个node，一组node构成一个集群cluster。

index会索引所有字段，elastic顶层单位就是index。类似于mysql的数据库。
document 就是index里面的单条记录。存储格式为json。类似于mysql的记录。
type是单条记录的分组，一个index里面可以有多个type，但是推荐一index里面一个type。Elastic 6.x 版只允许每个 Index 包含一个 Type，7.x 版将会彻底移除 Type。类似于msyql中的表。

库表操作

查看所有index curl -X GET ‘http://localhost:9200/_cat/indices?v’
删除index curl -X DELETE ’localhost:9200/weather'
查看映射器（表结构） curl ’localhost:9200/_mapping?pretty=true'
新增index并指定表数据结构

 $ curl -X PUT 'localhost:9200/accounts' -d '
{
  "mappings": {
    "person": {
      "properties": {
        "user": {
          "type": "text",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_max_word"
        },
        "title": {
          "type": "text",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_max_word"
        },
        "desc": {
          "type": "text",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_max_word"
        }
      }
    }
  }
}'

这里有三个字段，而且类型都是文本（text），所以需要指定中文分词器，不能使用默认的英文分词器。分词器要安装插件。

数据curd

新增

$ curl -X PUT 'localhost:9200/accounts/person/1' -d '
{
  "user": "张三",
  "title": "工程师",
  "desc": "数据库管理"
}'

向accounts索引person类别（表）中插入数据，指定记录的id为1。如果不指定，那么会自动随机生成一个字符串作为id，新增记录的时候，也可以不指定 Id，这时要改成 POST 请求。

查看

curl ’localhost:9200/accounts/person/1?pretty=true'

{
  "_index" : "accounts",
  "_type" : "person",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "user" : "张三",
    "title" : "工程师",
    "desc" : "数据库管理"
  }
}

我们还可以对字段完全相等查询 GET /accounts/person/_search?q=user:Smith

删除

curl -X DELETE ’localhost:9200/accounts/person/1'

更新

更新记录就是使用 PUT 请求，重新发送一次数据。

$ curl -X PUT 'localhost:9200/accounts/person/1' -d '
{
    "user" : "张三",
    "title" : "工程师",
    "desc" : "数据库管理，软件开发"
}' 

{
  "_index":"accounts",
  "_type":"person",
  "_id":"1",
  "_version":2,
  "result":"updated",
  "_shards":{"total":2,"successful":1,"failed":0},
  "created":false
}

复杂查询

查询所有记录。

使用 GET 方法，直接请求/Index/Type/_search，就会返回所有记录。

$ curl 'localhost:9200/accounts/person/_search'

{
  "took":2,
  "timed_out":false,
  "_shards":{"total":5,"successful":5,"failed":0},
  "hits":{
    "total":2,
    "max_score":1.0,
    "hits":[
      {
        "_index":"accounts",
        "_type":"person",
        "_id":"AV3qGfrC6jMbsbXb6k1p",
        "_score":1.0,
        "_source": {
          "user": "李四",
          "title": "工程师",
          "desc": "系统管理"
        }
      },
      {
        "_index":"accounts",
        "_type":"person",
        "_id":"1",
        "_score":1.0,
        "_source": {
          "user" : "张三",
          "title" : "工程师",
          "desc" : "数据库管理，软件开发"
        }
      }
    ]
  }
}

上面代码中，返回结果的 took字段表示该操作的耗时（单位为毫秒），timed_out字段表示是否超时，hits字段表示命中的记录，里面子字段的含义如下。

total：返回记录数，本例是2条。 max_score：最高的匹配程度，本例是1.0。 hits：返回的记录组成的数组。返回的记录中，每条记录都有一个_score字段，表示匹配的程序，默认是按照这个字段降序排列。

精准查询

精准查询用的是term

{
    "term" : {
        "price" : 20
    }
}

精准查询的时候就不用再对结果项进行匹配度评分了，用constant_score

模糊查询

match是模糊查询，只要包含就行。 Elastic 的查询非常特别，使用自己的查询语法，要求 GET 请求带有数据体。

$ curl 'localhost:9200/accounts/person/_search'  -d
'{
  "query" : { "match" : { "desc" : "软件" }}
}'

上面代码使用 Match 查询，指定的匹配条件是desc字段里面包含"软件"这个词。返回结果如下。

{
  "took":3,
  "timed_out":false,
  "_shards":{"total":5,"successful":5,"failed":0},
  "hits":{
    "total":1,
    "max_score":0.28582606,
    "hits":[
      {
        "_index":"accounts",
        "_type":"person",
        "_id":"1",
        "_score":0.28582606,
        "_source": {
          "user" : "张三",
          "title" : "工程师",
          "desc" : "数据库管理，软件开发"
        }
      }
    ]
  }
}

默认是10条记录,我们还可以分页

$ curl 'localhost:9200/accounts/person/_search'  -d '
{
  "query" : { "match" : { "desc" : "管理" }},
  "from": 1, //默认是从位置0开始
  "size": 1
}'

短语查询

上面的查询表达式中match是只要包含就行，那怕他们是不在一起的，但有时我们需要查询必须是在一起的，比如短语。

GET /megacorp/employee/_search
{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    }
}

这里match换成了match_phrase

高亮搜索

就是把搜索的结果高亮显示

组合查询

有时候需要把多个查询组合起来，用bool

{
   "bool" : {
      "must" :     [],//and
      "should" :   [],//or
      "must_not" : [],//非
   }
}

must,should,must_not不必全出现。

集群

一个运行中的 Elasticsearch 实例称为一个节点，而集群是由一个或者多个拥有相同 cluster.name 配置的节点组成，它们共同承担数据和负载的压力。当有节点加入集群中或者从集群中移除节点时，集群将会重新平均分布所有的数据。

当一个节点被选举成为主节点时，它将负责管理集群范围内的所有变更，例如增加、删除索引，或者增加、删除节点等。而主节点并不需要涉及到文档级别的变更和搜索等操作，所以当集群只拥有一个主节点的情况下，即使流量的增加它也不会成为瓶颈。任何节点都可以成为主节点。我们的示例集群就只有一个节点，所以它同时也成为了主节点。

作为用户，我们可以将请求发送到集群中的任何节点，包括主节点。 每个节点都知道任意文档所处的位置，并且能够将我们的请求直接转发到存储我们所需文档的节点。无论我们将请求发送到哪个节点，它都能负责从各个包含我们所需文档的节点收集回数据，并将最终结果返回給客户端。 Elasticsearch 对这一切的管理都是透明的。

GET /_cluster/health可以查看集群的状态。

golang使用elastic

使用第三方库github.com/olivere/elastic 插入

func main() {
	client, err := elastic.NewClient(elastic.SetURL("http://localhost:9200"))
	if err != nil {
		fmt.Println("connect es error")
	}
	//insertElastic(client)
	//delElastic(client)
	//queryElasticById(client)
	BulkAdd(client)
}

查询

func queryDocument(client *elastic.Client) {
	defer client.Stop()
	boolQuery := elastic.NewBoolQuery()

	boolQuery =boolQuery.Should(elastic.NewMatchQuery("Name","中"))
	boolQuery =boolQuery.Should(elastic.NewMatchQuery("Age",12))
	//query = query.Must(elastic.NewTermQuery("Name", "a bc"))
	//
	//query = query.Must(elastic.NewMatchQuery("Name", "ab"))
	//query = query.Must(elastic.NewRangeQuery("update_timestamp").Gte(criteria.UpdateTime))

	src, err := boolQuery.Source()
	if err != nil {
		panic(err)
	}
	data, err := json.MarshalIndent(src, "", "  ")
	if err != nil {
		panic(err)
	}
	fmt.Println(string(data))

	esResponse, err := client.Search().Index("user").Type("user").
		Query(boolQuery).
		Sort("Age", true).
		From(0).Size(10).
		Do(context.Background())
	fmt.Println(esResponse, err)
	for _, value := range esResponse.Hits.Hits {
		var doc *model.User
		json.Unmarshal(*value.Source,&doc)
		fmt.Println(doc)
	}
}