Core
Guides v2.x
2

# Store and access data

Kuzzle uses Elasticsearch as a document-oriented storage.

All documents, including internal Kuzzle ones (such as security information), are stored in Elasticsearch indexes.

Kuzzle's storage capabilities are therefore directly linked to Elasticsearch's capabilities and limits.

# Data storage organization

Kuzzle organizes the data storage in 4 levels:

  • indexes
  • collections
  • documents
  • fields

An index brings together several collections, which in turn contain several documents, each of which is composed of several fields.
data storage organization

# Comparison with a relational database

Even if Elasticsearch is not, strictly speaking, a database, the way it stores data is very similar to that of document-oriented databases.

If you're more familiar with the way relational databases store data, here is how it compares:

Document-oriented storage Relational databases storage
index database
collection table
document line
field column

Note: collections are specific to Kuzzle, this notion does not exist in Elasticsearch. Indexes and collections allows to have a more fine grained rights management system.

Comparing document-oriented storages with relational databases would require a more thorough analysis, but for the purposes of this guide, we shall reduce the list of differences to the following 3 items:

  • Documents are identified with a unique identifier, which is stored separately from the content of documents (compared to primary/foreign keys, stored alongside the data they identify),
  • no advanced join system,
  • a typed mapping system to define how Elasticsearch should index the fields.

All these differences should be taken into account when modeling your data model and your application.

# Creating indexes and collections

The creation of indexes and collections is done through the API via the methods index:create and collection:create.

For example, to create a nyc-open-data index:

Copied to clipboard!
curl -X POST localhost:7512/nyc-open-data/_create?pretty
Click to see Kuzzle API answer
{
  "requestId": "e9ab8d1a-ea1a-4fdd-ad50-07c82245d88c",
  "status": 200,
  "error": null,
  "controller": "index",
  "action": "create",
  "collection": null,
  "index": "nyc-open-data",
  "volatile": null,
  "result": {
    "acknowledged": true,
  }
}

Then a yellow-taxi collection in this index:

It is recommended to specify a data mapping when creating a collection so that its content can correctly be indexed by Elasticsearch.

Copied to clipboard!
curl -X PUT localhost:7512/nyc-open-data/yellow-taxi?pretty
Click to see Kuzzle API answer
{
  "requestId": "1d5b7afe-9d81-4c0e-92bc-aa57b24c35eb",
  "status": 200,
  "error": null,
  "controller": "collection",
  "action": "create",
  "collection": "yellow-taxi",
  "index": "nyc-open-data",
  "volatile": null,
  "result": {
    "acknowledged": true
  }
}

It is also possible to define in advance a set of indexes and collections, then load them at the start of Kuzzle (bin/start-kuzzle-server --mappings mappings.json) or with Kourou (kourou query admin:loadMappings < mappings.json).

# Writing documents

Kuzzle's API offers several methods to create, modify or delete documents in its storage space.

There are two families of methods: those acting on a document and those acting on multiple documents.

Methods acting on a single document:

Methods acting on multiple documents

The bulk controller features low-level methods for mass documents injection in collections.

For example, to create a new document in our index:

Copied to clipboard!
curl -X POST -H "Content-Type: application/json" -d '{ "driver": "liia", "arriveAt": "2019-07-26"  }' http://localhost:7512/nyc-open-data/yellow-taxi/document-uniq-id/_create?pretty
Click to see Kuzzle's answer
{
  "requestId": "e146e2a5-ff5b-4b6f-a603-8cde43f353fe",
  "status": 200,
  "error": null,
  "controller": "document",
  "action": "create",
  "collection": "yellow-taxi",
  "index": "nyc-open-data",
  "volatile": null,
  "result": {
    "_index": "nyc-open-data",
    "_type": "yellow-taxi",
    "_id": "document-uniq-id", // Document ID
    "_version": 1,
    "result": "created",
    "created": true,
    "_source": {                   // Document body
      "driver": "liia",
      "arriveAt": "2019-07-26",
      "_kuzzle_info": {            // Kuzzle metadata
        "author": "-1",
        "createdAt": 1561443009768,
        "updatedAt": null,
        "updater": null
      }
    }
  }
}

Using the document:update method allows us to add a new field while keeping the old ones:

Copied to clipboard!
curl -X PUT -H "Content-Type: application/json" -d '{ "car": "rickshaw"  }' http://localhost:7512/nyc-open-data/yellow-taxi/document-uniq-id/_update?pretty
Click to see Kuzzle's answer
{
  "requestId": "1be6c9e6-2626-4f85-ad64-d1cc248c7bee",
  "status": 200,
  "error": null,
  "controller": "document",
  "action": "update",
  "collection": "yellow-taxi",
  "index": "nyc-open-data",
  "volatile": null,
  "result": {
    "_index": "nyc-open-data",
    "_type": "yellow-taxi",
    "_id": "document-uniq-id",
    "_version": 2,
    "result": "updated"
  }
}

# Reading documents

There are two ways to retrieve documents:

  • using the document unique identifiers,
  • by performing a search with an Elasticsearch query.

# Getting documents

To retrieve a document when you know its unique identifier, you have to use the document:get or the document:mGet method.

For example, to retrieve the documents we created in the previous examples:

Copied to clipboard!
curl http://localhost:7512/nyc-open-data/yellow-taxi/document-uniq-id?pretty
Click to see Kuzzle's answer
{
  "requestId": "62af64c8-5dc6-48c1-942b-2604bf97686e",
  "status": 200,
  "error": null,
  "controller": "document",
  "action": "get",
  "collection": "yellow-taxi",
  "index": "nyc-open-data",
  "volatile": null,
  "result": {
    "_index": "nyc-open-data",
    "_type": "yellow-taxi",
    "_id": "document-uniq-id",
    "_version": 2,
    "found": true,
    "_source": {
      "driver": "liia",
      "arriveAt": "2019-07-26",
      "_kuzzle_info": {
        "author": "-1",
        "createdAt": 1561443222474,
        "updatedAt": 1561443279526,
        "updater": "-1"
      },
      "car": "rickshaw"
    }
  }
}

# Searching documents

Searching documents is performed using the Elasticsearch Query DSL.
As Elasticsearch is an indexing engine designed for document search, it offers a wide range of advanced search options like geo queries, full text queries, aggregations, and more.

Requests must be made through Kuzzle using the document:search method.

When a document is created or modified, its latest version is not immediately available in the results of a search.
First, you have to wait until Elasticsearch has finished updating its index.
It is possible to make Elasticsearch wait for the indexation before sending the answer by setting refresh=wait_for.

For example, to retrieve documents between the ages of 25 and 28:

Copied to clipboard!
# First create some documents
for i in {18..42}; do; curl -X POST -H "Content-Type: application/json" -d "{ \"driver\": \"driver-$i\", \"age\": $i  }" http://localhost:7512/nyc-open-data/yellow-taxi/_create &; sleep 0.05; done

# Search for drivers between 25 and 28 years
curl -X POST -H "Content-Type: application/json" -d '{ 
  "query": { 
    "range": { 
      "age": { "gte": 25, "lte": 28 } 
    } 
  }  
}
' http://localhost:7512/nyc-open-data/yellow-taxi/_search?pretty

Click to see Kuzzle's answer
{
  "requestId": "836768a4-0b46-447a-b4c5-8932101f24de",
  "status": 200,
  "error": null,
  "controller": "document",
  "action": "search",
  "collection": "yellow-taxi",
  "index": "nyc-open-data",
  "volatile": null,
  "result": {
    "took": 12,
    "timed_out": false,
    "hits": [
      {
        "_index": "nyc-open-data",
        "_type": "yellow-taxi",
        "_id": "AWuNXWff6MDMyQmSeEuT",
        "_score": 1,
        "_source": {
          "driver": "driver-27",
          "age": 27,
          "_kuzzle_info": {
            "author": "-1",
            "createdAt": 1561444837342,
            "updatedAt": null,
            "updater": null
          }
        }
      },
      {
        "_index": "nyc-open-data",
        "_type": "yellow-taxi",
        "_id": "AWuNXWd46MDMyQmSeEuR",
        "_score": 1,
        "_source": {
          "driver": "driver-25",
          "age": 25,
          "_kuzzle_info": {
            "author": "-1",
            "createdAt": 1561444837239,
            "updatedAt": null,
            "updater": null
          }
        }
      },
      {
        "_index": "nyc-open-data",
        "_type": "yellow-taxi",
        "_id": "AWuNXWgQ6MDMyQmSeEuU",
        "_score": 1,
        "_source": {
          "driver": "driver-28",
          "age": 28,
          "_kuzzle_info": {
            "author": "-1",
            "createdAt": 1561444837391,
            "updatedAt": null,
            "updater": null
          }
        }
      },
      {
        "_index": "nyc-open-data",
        "_type": "yellow-taxi",
        "_id": "AWuNXWer6MDMyQmSeEuS",
        "_score": 1,
        "_source": {
          "driver": "driver-26",
          "age": 26,
          "_kuzzle_info": {
            "author": "-1",
            "createdAt": 1561444837290,
            "updatedAt": null,
            "updater": null
          }
        }
      }
    ],
    "total": 4,
    "max_score": 1
  }
}

# Kuzzle indexes in Elasticseach

Elasticsearch indexes created and managed by Kuzzle follow this naming convention:

  • private indexes: %<index name>.<collection name> (Kuzzle internal data, plugins dedicated storage)
  • public indexes: &<index name>.<collection name>

Indexes not following this naming policy cannot be accessed by Kuzzle's API.
Create an Elasticsearch alias to share a regular index with Kuzzle (and vice-versa).

# What Now?