Querying Elasticsearch - Blog about things

This post will begin our look at querying elasticsearch directly, via it’s search API. We’ve looked at reporting and graphing tools like Kibana which provide some veneer over the actual queries. Now we’ll see what the queries and responses look like under the covers.

The first query we’ll make will search an entire index with no filter provided - we will just dump the data content.

The API is accessible via an HTTP or HTTPS URI using the POST command. There are many search flavors available, documented in detail at the elasticsearch search API; we’ll just touch the surface here. The search API is accessible using a query parameter or request body. The query parameter is limited but good for some testing so we’ll use that first.

The simplest search query ever …

The URI structure to invoke the simplest elasticsearch query API looks like this:
http(s)://logsene-receiver.sematext.com/OUR-LOGSENE-APP-TOKEN/_search

Using curl to invoke the simplest search query here is our response (truncated):

curl -XPOST http://logsene-receiver.sematext.com/OUR-LOGSENE-APP-TOKEN/_search

{
  "took": 56,
  "timed_out": false,
  "_shards": {
    "total": 16,
    "successful": 16,
    "failed": 0
  },
  "hits": {
    "total": 6559,
    "max_score": 1,
    "hits": [
      {
        "_index": "OUR-LOGSENE-APP-TOKEN_2014-10-04",
        "_type": "apache-access",
        "_id": "vfo3cnGFR5mU3Yyio_eG7w",
        "_score": 1,
        "_source": {
          "message": "223.151.106.187 - - [04/Oct/2014:01:40:50 +0000] \"CONNECT api.okcra.org:443 HTTP/1.1\" 404 137 \"-\" \"-\"",
          "@version": "1",
          "@timestamp": "2014-10-04T01:40:50.279Z",
          "type": "apache-access",
          "host": "ip-10-65-29-118",
          "path": "/var/log/httpd/api_okcra_ssl_access_log",
          "clientip": "223.151.106.187",
          "ident": "-",
          "auth": "-",
          "timestamp": "04/Oct/2014:01:40:50 +0000",
          "verb": "CONNECT",
          "request": "api.okcra.org:443",
          "httpversion": "1.1",
          "response": "404",
          "bytes": "137",
          "referrer": "\"-\"",
          "agent": "\"-\""
        }
      },
      ...

The response is pretty interesting.
The metadata at the beginning tells us the query took 56 ms across 16 shards and did not timeout.
The hits object is an array of objects, I’ve truncated the list at one entry from default of 10 returned objects. There were 6,559 entries that matched our search criteria. The elements beginning with underscore were generated by elasticsearch when the entry was ingested. Look, _type == ‘apache-access’ - we’ve seen that before!
The _source object is the most interesting as it is the JSON message sent from logstash. The message field is the original log entry line. The type field was assigned in the logstash input specification. The other fields are values parsed from the apache combined log format by logstash prior to shipping the json-formatted log to elasticsearch.

You should be able to see how some of the pieces have come together now. logstash configuration and parser (grok filter) transformed a raw apache log line into json-formatted data, sent it to elasticsearch where it was ingested, indexed by field, and made available over the network via an API for us and our applications to query. Cool stuff indeed.

Search within type(s)

You can optionally add the record type to limit your search within that type. For example, if we wished to search only within type apache-access we could use a URI like this:
http(s)://logsene-receiver.sematext.com/OUR-LOGSENE-APP-TOKEN/apache-access/_search

You can add multiple types, by comma-separating them like this:
http(s)://logsene-receiver.sematext.com/OUR-LOGSENE-APP-TOKEN/apache-access,syslog-cee/_search

Search for records where a field equals some value

Say, we want to find all apache access records where the http response code was a 400. We could use a query like this:
http(s)://logsene-receiver.sematext.com/OUR-LOGSENE-APP-TOKEN/apache-access/_search?response:400

A simple query body

This is the same query as above but uses the Query DSL body
http(s)://logsene-receiver.sematext.com/OUR-LOGSENE-APP-TOKEN/apache-access/_search -d ‘{ “query” : { “term” : { “response” : “400”} }}’

DSL=Domain Specific Language so Query DSL is a language meant to describe queries in elasticsearch. Elasticsearch Query DSL documentation

Get a simple count

There are lots of things we can do - one of which is simply count the number of records for a given type. An excruciatingly simple example of using the Query DSL to count records of type apache-access might look something like this:

curl -q -XPOST http://logsene-receiver.sematext.com/OUR-LOGSENE-APP-TOKEN/apache-access/_count?pretty -d '{
  query: {
    "match_all": {}
  }
}'

{
  "count": 184,
  "_shards": {
    "total": 16,
    "successful": 16,
    "failed": 0
  }
}

There are currently 184 such records. Adding query parameter pretty told elasticsearch to pretty print the JSON response for easier consumption by humans.